Yes =- we already use symlinking actually. For standard provider's examples added to airflow-core. And yes that is a good option for anything shared in 2) mode. When packaging , by default such symlinks are stored as files they point to.
J. On Wed, Jul 2, 2025 at 9:33 PM Jens Scheffler <j_scheff...@gmx.de.invalid> wrote: > I'd also rather prefer option 2 - reason here is it is rather pragmatic > and we no not need to cut another package and have less package counts > and dependencies. > > I remember some time ago I was checking (together with Jarek, I am not > sure anymore...) if the usage of symlinks would be possible. To keep the > source in one package but "symlink" it into another. If then at point of > packaging/release the files are materialized we have 1 set of code. > > Otherwise if not possible still the redundancy could be solved by a > pre-commit hook - and in Git the files are de-duplicated anyway based on > content hash, so this does not hurt. > > On 02.07.25 18:49, Shahar Epstein wrote: > > I support option 2 with proper automation & CI - the reasonings you've > > shown for that make sense to me. > > > > > > Shahar > > > > > > On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > >> Hello everyone, > >> > >> As we work on finishing off the code-level separation of Task SDK and > Core > >> (scheduler etc) we have come across some situations where we would like > to > >> share code between these. > >> > >> However it’s not as straight forward of “just put it in a common dist > they > >> both depend upon” because one of the goals of the Task SDK separation > was > >> to have 100% complete version independence between the two, ideally > even if > >> they are built into the same image and venv. Most of the reason why this > >> isn’t straight forward comes down to backwards compatibility - if we > make > >> an change to the common/shared distribution > >> > >> > >> We’ve listed the options we have thought about in > >> https://github.com/apache/airflow/issues/51545 (but that covers some > more > >> things that I don’t want to get in to in this discussion such as > possibly > >> separating operators and executors out of a single provider dist.) > >> > >> To give a concrete example of some code I would like to share > >> > https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py > >> — logging config. Another thing we will want to share will be the > >> AirflowConfigParser class from airflow.configuration (but notably: only > the > >> parser class, _not_ the default config values, again, lets not dwell on > the > >> specifics of that) > >> > >> So to bring the options listed in the issue here for discussion, broadly > >> speaking there are two high-level approaches: > >> > >> 1. A single shared distribution > >> 2. No shared package and copy/duplicate code > >> > >> The advantage of Approach 1 is that we only have the code in one place. > >> However for me, at least in this specific case of Logging config or > >> AirflowConfigParser class is that backwards compatibility is much much > >> harder. > >> > >> The main advantage of Approach 2 is the the code is released > with/embedded > >> in the dist (i.e. apache-airflow-task-sdk would contain the right > version > >> of the logging config and ConfigParser etc). The downside is that either > >> the code will need to be duplicated in the repo, or better yet it would > >> live in a single place in the repo, but some tooling (TBD) will > >> automatically handle the duplication, either at commit time, or my > >> preference, at release time. > >> > >> For this kind of shared “utility” code I am very strongly leaning > towards > >> option 2 with automation, as otherwise I think the backwards > compatibility > >> requirements would make it unworkable (very quickly over time the > >> combinations we would have to test would just be unreasonable) and I > don’t > >> feel confident we can have things as stable as we need to really deliver > >> the version separation/independency I want to delivery with AIP-72. > >> > >> So unless someone feels very strongly about this, I will come up with a > >> draft PR for further discussion that will implement code sharing via > >> “vendoring” it at build time. I have an idea of how I can achieve this > so > >> we have a single version in the repo and it’ll work there, but at > runtime > >> we vendor it in to the shipped dist so it lives at something like > >> `airflow.sdk._vendor` etc. > >> > >> In terms of repo layout, this likely means we would end up with: > >> > >> airflow-core/pyproject.toml > >> airflow-core/src/ > >> airflow-core/tests/ > >> task-sdk/pyproject.toml > >> task-sdk/src/ > >> task-sdk/tests/ > >> airflow-common/src > >> airflow-common/tests/ > >> # Possibly no airflow-common/pyproject.toml, as deps would be included > in > >> the downstream projects. TBD. > >> > >> Thoughts and feedback welcomed. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >