Thanks Ash

Yes agree option 2 would be preferred for me. Making sure we have all the
gaurdriles to protect any unwanted behaviour in code sharing and executing
right of tests between the packages.

Agree with others, option 2 would be

On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <amoghdesai....@gmail.com>
wrote:

> Thanks for starting this discussion, Ash.
>
> I would prefer option 2 here with proper tooling to handle the code
> duplication at *release* time.
> It is best to have a dist that has all it needs in itself.
>
> Option 1 could very quickly get out of hand and if we decide to separate
> triggerer /
> dag processor / config etc etc as separate packages, back compat is going
> to be a nightmare
> and will bite us harder than we anticipate.
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > I prefer Option 2 as well to avoid matrix of dependencies
> >
> > On Thu, 3 Jul 2025 at 01:03, Jens Scheffler <j_scheff...@gmx.de.invalid>
> > wrote:
> >
> > > I'd also rather prefer option 2 - reason here is it is rather pragmatic
> > > and we no not need to cut another package and have less package counts
> > > and dependencies.
> > >
> > > I remember some time ago I was checking (together with Jarek, I am not
> > > sure anymore...) if the usage of symlinks would be possible. To keep
> the
> > > source in one package but "symlink" it into another. If then at point
> of
> > > packaging/release the files are materialized we have 1 set of code.
> > >
> > > Otherwise if not possible still the redundancy could be solved by a
> > > pre-commit hook - and in Git the files are de-duplicated anyway based
> on
> > > content hash, so this does not hurt.
> > >
> > > On 02.07.25 18:49, Shahar Epstein wrote:
> > > > I support option 2 with proper automation & CI - the reasonings
> you've
> > > > shown for that make sense to me.
> > > >
> > > >
> > > > Shahar
> > > >
> > > >
> > > > On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > > >
> > > >> Hello everyone,
> > > >>
> > > >> As we work on finishing off the code-level separation of Task SDK
> and
> > > Core
> > > >> (scheduler etc) we have come across some situations where we would
> > like
> > > to
> > > >> share code between these.
> > > >>
> > > >> However it’s not as straight forward of “just put it in a common
> dist
> > > they
> > > >> both depend upon” because one of the goals of the Task SDK
> separation
> > > was
> > > >> to have 100% complete version independence between the two, ideally
> > > even if
> > > >> they are built into the same image and venv. Most of the reason why
> > this
> > > >> isn’t straight forward comes down to backwards compatibility - if we
> > > make
> > > >> an change to the common/shared distribution
> > > >>
> > > >>
> > > >> We’ve listed the options we have thought about in
> > > >> https://github.com/apache/airflow/issues/51545 (but that covers
> some
> > > more
> > > >> things that I don’t want to get in to in this discussion such as
> > > possibly
> > > >> separating operators and executors out of a single provider dist.)
> > > >>
> > > >> To give a concrete example of some code I would like to share
> > > >>
> > >
> >
> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
> > > >> — logging config. Another thing we will want to share will be the
> > > >> AirflowConfigParser class from airflow.configuration (but notably:
> > only
> > > the
> > > >> parser class, _not_ the default config values, again, lets not dwell
> > on
> > > the
> > > >> specifics of that)
> > > >>
> > > >> So to bring the options listed in the issue here for discussion,
> > broadly
> > > >> speaking there are two high-level approaches:
> > > >>
> > > >> 1. A single shared distribution
> > > >> 2. No shared package and copy/duplicate code
> > > >>
> > > >> The advantage of Approach 1 is that we only have the code in one
> > place.
> > > >> However for me, at least in this specific case of Logging config or
> > > >> AirflowConfigParser class is that backwards compatibility is much
> much
> > > >> harder.
> > > >>
> > > >> The main advantage of Approach 2 is the the code is released
> > > with/embedded
> > > >> in the dist (i.e. apache-airflow-task-sdk would contain the right
> > > version
> > > >> of the logging config and ConfigParser etc). The downside is that
> > either
> > > >> the code will need to be duplicated in the repo, or better yet it
> > would
> > > >> live in a single place in the repo, but some tooling (TBD) will
> > > >> automatically handle the duplication, either at commit time, or my
> > > >> preference, at release time.
> > > >>
> > > >> For this kind of shared “utility” code I am very strongly leaning
> > > towards
> > > >> option 2 with automation, as otherwise I think the backwards
> > > compatibility
> > > >> requirements would make it unworkable (very quickly over time the
> > > >> combinations we would have to test would just be unreasonable) and I
> > > don’t
> > > >> feel confident we can have things as stable as we need to really
> > deliver
> > > >> the version separation/independency I want to delivery with AIP-72.
> > > >>
> > > >> So unless someone feels very strongly about this, I will come up
> with
> > a
> > > >> draft PR for further discussion that will implement code sharing via
> > > >> “vendoring” it at build time. I have an idea of how I can achieve
> this
> > > so
> > > >> we have a single version in the repo and it’ll work there, but at
> > > runtime
> > > >> we vendor it in to the shipped dist so it lives at something like
> > > >> `airflow.sdk._vendor` etc.
> > > >>
> > > >> In terms of repo layout, this likely means we would end up with:
> > > >>
> > > >> airflow-core/pyproject.toml
> > > >> airflow-core/src/
> > > >> airflow-core/tests/
> > > >> task-sdk/pyproject.toml
> > > >> task-sdk/src/
> > > >> task-sdk/tests/
> > > >> airflow-common/src
> > > >> airflow-common/tests/
> > > >> # Possibly no airflow-common/pyproject.toml, as deps would be
> included
> > > in
> > > >> the downstream projects. TBD.
> > > >>
> > > >> Thoughts and feedback welcomed.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> >
>

Reply via email to