> > We should clarify in the AIP doc that the proposed partitioning feature is > not designed specifically to handle incremental loads in the traditional > sense. Instead, it is intended to manage and process data in defined > segments or partitions.
Agree. > However, partitions can be used in conjunction with incremental loading > strategies. For example, a time-based partitioning scheme can ensure that > only data from relevant time periods is processed, > *and within thosepartitions, incremental updates can be tracked and > processed.* I'm not sure what you mean by this, particularly the bit I emphasized. Can you try to clarify? On Mon, Jul 29, 2024 at 11:01 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Yeah, TP and I discussed that we aren't solving the incremental load > problem; folks can use it to achieve it similar to how you achieved it by > storing the Watermark in Variables and we can natively support it with a > revised AIP-30 in one of the minor releases for Airflow 3. > > We should clarify in the AIP doc that the proposed partitioning feature is > not designed specifically to handle incremental loads in the traditional > sense. Instead, it is intended to manage and process data in defined > segments or partitions. > > However, partitions can be used in conjunction with incremental loading > strategies. For example, a time-based partitioning scheme can ensure that > only data from relevant time periods is processed, and within those > partitions, incremental updates can be tracked and processed. > > Regards, > Kaxil > > > > On Mon, 29 Jul 2024 at 18:00, Daniel Standish > <daniel.stand...@astronomer.io.invalid> wrote: > > > Hi, > > > > *1. incremental loads* > > > > There is mention of incremental processing / incremental loads in the > doc. > > > > E.g. > > > > This is particularly useful for large datasets that need to be processed > > > incrementally or updated periodically. > > > > > > And > > > > > Facilitating Incremental Processing: Many modern data processing > > > strategies rely on incremental updates > > > > > > But there are no examples re how this solves for that use case. > > > > I think it's actually not good to think of or talk about incremental > loads > > as "partitioned". > > > > Let me explain. > > > > An incremental load might track an `updated_at` column. The data it > > processes is the data with an updated `updated_at` column. But you would > > not be correct in calling this a partition of data. Because when the > data > > is updated again, it would now be in another partition. That's not what > > partitioning is. > > > > If this is supposed to solve for incremental loads, I think an example is > > needed. If it's not, let's call it out explicitly and say this is not > > solving for incremental loads. > > > > *2. support for tasks* > > > > I see this is specific to tasks defined with the asset syntax. What's > the > > story with "normal" dags and tasks e.g. with task flow or classic > > operators. Is this AIP adding support only for assets? Is there some > plan > > for that? > > >