Thanks for starting this discussion, Ash.

I would prefer option 2 here with proper tooling to handle the code
duplication at *release* time.
It is best to have a dist that has all it needs in itself.

Option 1 could very quickly get out of hand and if we decide to separate
triggerer /
dag processor / config etc etc as separate packages, back compat is going
to be a nightmare
and will bite us harder than we anticipate.

Thanks & Regards,
Amogh Desai


On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> I prefer Option 2 as well to avoid matrix of dependencies
>
> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler <j_scheff...@gmx.de.invalid>
> wrote:
>
> > I'd also rather prefer option 2 - reason here is it is rather pragmatic
> > and we no not need to cut another package and have less package counts
> > and dependencies.
> >
> > I remember some time ago I was checking (together with Jarek, I am not
> > sure anymore...) if the usage of symlinks would be possible. To keep the
> > source in one package but "symlink" it into another. If then at point of
> > packaging/release the files are materialized we have 1 set of code.
> >
> > Otherwise if not possible still the redundancy could be solved by a
> > pre-commit hook - and in Git the files are de-duplicated anyway based on
> > content hash, so this does not hurt.
> >
> > On 02.07.25 18:49, Shahar Epstein wrote:
> > > I support option 2 with proper automation & CI - the reasonings you've
> > > shown for that make sense to me.
> > >
> > >
> > > Shahar
> > >
> > >
> > > On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> > >
> > >> Hello everyone,
> > >>
> > >> As we work on finishing off the code-level separation of Task SDK and
> > Core
> > >> (scheduler etc) we have come across some situations where we would
> like
> > to
> > >> share code between these.
> > >>
> > >> However it’s not as straight forward of “just put it in a common dist
> > they
> > >> both depend upon” because one of the goals of the Task SDK separation
> > was
> > >> to have 100% complete version independence between the two, ideally
> > even if
> > >> they are built into the same image and venv. Most of the reason why
> this
> > >> isn’t straight forward comes down to backwards compatibility - if we
> > make
> > >> an change to the common/shared distribution
> > >>
> > >>
> > >> We’ve listed the options we have thought about in
> > >> https://github.com/apache/airflow/issues/51545 (but that covers some
> > more
> > >> things that I don’t want to get in to in this discussion such as
> > possibly
> > >> separating operators and executors out of a single provider dist.)
> > >>
> > >> To give a concrete example of some code I would like to share
> > >>
> >
> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
> > >> — logging config. Another thing we will want to share will be the
> > >> AirflowConfigParser class from airflow.configuration (but notably:
> only
> > the
> > >> parser class, _not_ the default config values, again, lets not dwell
> on
> > the
> > >> specifics of that)
> > >>
> > >> So to bring the options listed in the issue here for discussion,
> broadly
> > >> speaking there are two high-level approaches:
> > >>
> > >> 1. A single shared distribution
> > >> 2. No shared package and copy/duplicate code
> > >>
> > >> The advantage of Approach 1 is that we only have the code in one
> place.
> > >> However for me, at least in this specific case of Logging config or
> > >> AirflowConfigParser class is that backwards compatibility is much much
> > >> harder.
> > >>
> > >> The main advantage of Approach 2 is the the code is released
> > with/embedded
> > >> in the dist (i.e. apache-airflow-task-sdk would contain the right
> > version
> > >> of the logging config and ConfigParser etc). The downside is that
> either
> > >> the code will need to be duplicated in the repo, or better yet it
> would
> > >> live in a single place in the repo, but some tooling (TBD) will
> > >> automatically handle the duplication, either at commit time, or my
> > >> preference, at release time.
> > >>
> > >> For this kind of shared “utility” code I am very strongly leaning
> > towards
> > >> option 2 with automation, as otherwise I think the backwards
> > compatibility
> > >> requirements would make it unworkable (very quickly over time the
> > >> combinations we would have to test would just be unreasonable) and I
> > don’t
> > >> feel confident we can have things as stable as we need to really
> deliver
> > >> the version separation/independency I want to delivery with AIP-72.
> > >>
> > >> So unless someone feels very strongly about this, I will come up with
> a
> > >> draft PR for further discussion that will implement code sharing via
> > >> “vendoring” it at build time. I have an idea of how I can achieve this
> > so
> > >> we have a single version in the repo and it’ll work there, but at
> > runtime
> > >> we vendor it in to the shipped dist so it lives at something like
> > >> `airflow.sdk._vendor` etc.
> > >>
> > >> In terms of repo layout, this likely means we would end up with:
> > >>
> > >> airflow-core/pyproject.toml
> > >> airflow-core/src/
> > >> airflow-core/tests/
> > >> task-sdk/pyproject.toml
> > >> task-sdk/src/
> > >> task-sdk/tests/
> > >> airflow-common/src
> > >> airflow-common/tests/
> > >> # Possibly no airflow-common/pyproject.toml, as deps would be included
> > in
> > >> the downstream projects. TBD.
> > >>
> > >> Thoughts and feedback welcomed.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to