I'd also rather prefer option 2 - reason here is it is rather pragmatic and we no not need to cut another package and have less package counts and dependencies.

I remember some time ago I was checking (together with Jarek, I am not sure anymore...) if the usage of symlinks would be possible. To keep the source in one package but "symlink" it into another. If then at point of packaging/release the files are materialized we have 1 set of code.

Otherwise if not possible still the redundancy could be solved by a pre-commit hook - and in Git the files are de-duplicated anyway based on content hash, so this does not hurt.

On 02.07.25 18:49, Shahar Epstein wrote:
I support option 2 with proper automation & CI - the reasonings you've
shown for that make sense to me.


Shahar


On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org> wrote:

Hello everyone,

As we work on finishing off the code-level separation of Task SDK and Core
(scheduler etc) we have come across some situations where we would like to
share code between these.

However it’s not as straight forward of “just put it in a common dist they
both depend upon” because one of the goals of the Task SDK separation was
to have 100% complete version independence between the two, ideally even if
they are built into the same image and venv. Most of the reason why this
isn’t straight forward comes down to backwards compatibility - if we make
an change to the common/shared distribution


We’ve listed the options we have thought about in
https://github.com/apache/airflow/issues/51545 (but that covers some more
things that I don’t want to get in to in this discussion such as possibly
separating operators and executors out of a single provider dist.)

To give a concrete example of some code I would like to share
https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
— logging config. Another thing we will want to share will be the
AirflowConfigParser class from airflow.configuration (but notably: only the
parser class, _not_ the default config values, again, lets not dwell on the
specifics of that)

So to bring the options listed in the issue here for discussion, broadly
speaking there are two high-level approaches:

1. A single shared distribution
2. No shared package and copy/duplicate code

The advantage of Approach 1 is that we only have the code in one place.
However for me, at least in this specific case of Logging config or
AirflowConfigParser class is that backwards compatibility is much much
harder.

The main advantage of Approach 2 is the the code is released with/embedded
in the dist (i.e. apache-airflow-task-sdk would contain the right version
of the logging config and ConfigParser etc). The downside is that either
the code will need to be duplicated in the repo, or better yet it would
live in a single place in the repo, but some tooling (TBD) will
automatically handle the duplication, either at commit time, or my
preference, at release time.

For this kind of shared “utility” code I am very strongly leaning towards
option 2 with automation, as otherwise I think the backwards compatibility
requirements would make it unworkable (very quickly over time the
combinations we would have to test would just be unreasonable) and I don’t
feel confident we can have things as stable as we need to really deliver
the version separation/independency I want to delivery with AIP-72.

So unless someone feels very strongly about this, I will come up with a
draft PR for further discussion that will implement code sharing via
“vendoring” it at build time. I have an idea of how I can achieve this so
we have a single version in the repo and it’ll work there, but at runtime
we vendor it in to the shipped dist so it lives at something like
`airflow.sdk._vendor` etc.

In terms of repo layout, this likely means we would end up with:

airflow-core/pyproject.toml
airflow-core/src/
airflow-core/tests/
task-sdk/pyproject.toml
task-sdk/src/
task-sdk/tests/
airflow-common/src
airflow-common/tests/
# Possibly no airflow-common/pyproject.toml, as deps would be included in
the downstream projects. TBD.

Thoughts and feedback welcomed.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to