OK folks, let's get into it.

AIP-83 fundamentally alters Airflow's dag run semantics in a way that I
think no one fully appreciated when it was proposed and adopted.
Previously the logical key of dag_run was dag_id + logical_date.  That
combination defined what a dag run means, and uniquely identified a dag
run.  In AIP-83, We removed the constraint, which is easy enough; but we
did not do anything to maintain backcompat, or to address the semantic
ambiguities introduced.  Ultimately, we need to decide as a community what
to do.

I don't expect to impose my will upon the community.  I let go of outcome.
But I aim to help folks understand the issue so that we can collectively
arrive at a good decision for the project.  And I would only ask that, if
you intend to engage and comment, that you try to be patient and read and
consider the whole doc.  It probably won't take as long as it looks.

One fundamental question is, should we continue to support the old
semantics or not.  Do users expect or depend on the old semantics?  And do
we care?

If not, then we ought to be clear about this with users.  If we want to
support the old semantics, then we need to decide what that looks like.

Here is a draft AIP amendment
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-83+amendment+to+support+classic+Airflow+authoring+style>
where I've outlined the the history, motivations, concerns, and options
that I am aware of, and as I understand them.  There are 5 of them.  My
vote would be anything between 1 and 3, with mild preference for 3.

Personally, I don't see any reason why we cannot or should not allow users
to elect to design dags with the old semantics.  To be clear, I am no
champion of this design pattern; in my former life as data engineer, I
would say my dags were execution-date-driven something less than 1% of the
time.  It makes a lot of sense for a hive / presto / athena shop, but much
less e.g. for snowflake.  But I recognize that, many folks do use it, and
Airflow has gotten this far assuming *all* dags are designed this way, so,
maybe we should allow users to optionally keep those semantics in Airflow 3.

The other thing that we should think through is, *why* are we removing
uniqueness.  Why do we care?  What does this buy us?  If it's just about
allowing manual triggering of dags that don't care about logical date, then
one option would be to just make logical date nullable and call it a day.

Alright, happy holidays.  Let's try and step our way towards consensus and
figure out the best path forward for Airflow.

Reply via email to