Yeah, TP and I discussed that we aren't solving the incremental load
problem; folks can use it to achieve it similar to how you achieved it by
storing the Watermark in Variables and we can natively support it with a
revised AIP-30 in one of the minor releases for Airflow 3.

We should clarify in the AIP doc that the proposed partitioning feature is
not designed specifically to handle incremental loads in the traditional
sense. Instead, it is intended to manage and process data in defined
segments or partitions.

However, partitions can be used in conjunction with incremental loading
strategies. For example, a time-based partitioning scheme can ensure that
only data from relevant time periods is processed, and within those
partitions, incremental updates can be tracked and processed.

Regards,
Kaxil



On Mon, 29 Jul 2024 at 18:00, Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> Hi,
>
> *1. incremental loads*
>
> There is mention of incremental processing / incremental loads in the doc.
>
> E.g.
>
> This is particularly useful for large datasets that need to be processed
> > incrementally or updated periodically.
>
>
> And
>
> > Facilitating Incremental Processing: Many modern data processing
> > strategies rely on incremental updates
>
>
> But there are no examples re how this solves for that use case.
>
> I think it's actually not good to think of or talk about incremental loads
> as "partitioned".
>
> Let me explain.
>
> An incremental load might track an `updated_at` column.  The data it
> processes is the data with an updated `updated_at` column.  But you would
> not be correct in calling this a partition of data.  Because when the data
> is updated again, it would now be in another partition.  That's not what
> partitioning is.
>
> If this is supposed to solve for incremental loads, I think an example is
> needed.  If it's not, let's call it out explicitly and say this is not
> solving for incremental loads.
>
> *2. support for tasks*
>
> I see this is specific to tasks defined with the asset syntax.  What's the
> story with "normal" dags and tasks e.g. with task flow or classic
> operators.  Is this AIP adding support only for assets?  Is there some plan
> for that?
>

Reply via email to