Oh yes, symlinks will work, with one big caveat: It does mean you can’t use absolute imports in one common module to another.
For example https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 where we have ``` from airflow.utils.module_loading import import_string ``` if we want to move serve_logs into this common lib that is then symlinked then we wouldn’t be able to have `from airflow_common.module_loading import import_string`. I can think of two possible solutions here. 1) is to allow/require relative imports in this shared lib, so `from .module_loading import import_string` 2) is to use `vendoring`[1] (from the pip maintainers) which will handle import-rewriting for us. I’d entirely forgot that symlinks in repos was a thing, so I prepared a minimal POC/demo of what vendoring approach could look like here https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 Now personally I am more than happy with relative imports, but generally as a project we have avoided them, so I think that limits what we could do with a symlink based approach. -ash [1] https://github.com/pradyunsg/vendoring > On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <gopidesupa...@gmail.com> wrote: > > Thanks Ash > > Yes agree option 2 would be preferred for me. Making sure we have all the > gaurdriles to protect any unwanted behaviour in code sharing and executing > right of tests between the packages. > > Agree with others, option 2 would be > > On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <amoghdesai....@gmail.com> > wrote: > >> Thanks for starting this discussion, Ash. >> >> I would prefer option 2 here with proper tooling to handle the code >> duplication at *release* time. >> It is best to have a dist that has all it needs in itself. >> >> Option 1 could very quickly get out of hand and if we decide to separate >> triggerer / >> dag processor / config etc etc as separate packages, back compat is going >> to be a nightmare >> and will bite us harder than we anticipate. >> >> Thanks & Regards, >> Amogh Desai >> >> >> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> wrote: >> >>> I prefer Option 2 as well to avoid matrix of dependencies >>> >>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler <j_scheff...@gmx.de.invalid> >>> wrote: >>> >>>> I'd also rather prefer option 2 - reason here is it is rather pragmatic >>>> and we no not need to cut another package and have less package counts >>>> and dependencies. >>>> >>>> I remember some time ago I was checking (together with Jarek, I am not >>>> sure anymore...) if the usage of symlinks would be possible. To keep >> the >>>> source in one package but "symlink" it into another. If then at point >> of >>>> packaging/release the files are materialized we have 1 set of code. >>>> >>>> Otherwise if not possible still the redundancy could be solved by a >>>> pre-commit hook - and in Git the files are de-duplicated anyway based >> on >>>> content hash, so this does not hurt. >>>> >>>> On 02.07.25 18:49, Shahar Epstein wrote: >>>>> I support option 2 with proper automation & CI - the reasonings >> you've >>>>> shown for that make sense to me. >>>>> >>>>> >>>>> Shahar >>>>> >>>>> >>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org> >>> wrote: >>>>> >>>>>> Hello everyone, >>>>>> >>>>>> As we work on finishing off the code-level separation of Task SDK >> and >>>> Core >>>>>> (scheduler etc) we have come across some situations where we would >>> like >>>> to >>>>>> share code between these. >>>>>> >>>>>> However it’s not as straight forward of “just put it in a common >> dist >>>> they >>>>>> both depend upon” because one of the goals of the Task SDK >> separation >>>> was >>>>>> to have 100% complete version independence between the two, ideally >>>> even if >>>>>> they are built into the same image and venv. Most of the reason why >>> this >>>>>> isn’t straight forward comes down to backwards compatibility - if we >>>> make >>>>>> an change to the common/shared distribution >>>>>> >>>>>> >>>>>> We’ve listed the options we have thought about in >>>>>> https://github.com/apache/airflow/issues/51545 (but that covers >> some >>>> more >>>>>> things that I don’t want to get in to in this discussion such as >>>> possibly >>>>>> separating operators and executors out of a single provider dist.) >>>>>> >>>>>> To give a concrete example of some code I would like to share >>>>>> >>>> >>> >> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>>>>> — logging config. Another thing we will want to share will be the >>>>>> AirflowConfigParser class from airflow.configuration (but notably: >>> only >>>> the >>>>>> parser class, _not_ the default config values, again, lets not dwell >>> on >>>> the >>>>>> specifics of that) >>>>>> >>>>>> So to bring the options listed in the issue here for discussion, >>> broadly >>>>>> speaking there are two high-level approaches: >>>>>> >>>>>> 1. A single shared distribution >>>>>> 2. No shared package and copy/duplicate code >>>>>> >>>>>> The advantage of Approach 1 is that we only have the code in one >>> place. >>>>>> However for me, at least in this specific case of Logging config or >>>>>> AirflowConfigParser class is that backwards compatibility is much >> much >>>>>> harder. >>>>>> >>>>>> The main advantage of Approach 2 is the the code is released >>>> with/embedded >>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the right >>>> version >>>>>> of the logging config and ConfigParser etc). The downside is that >>> either >>>>>> the code will need to be duplicated in the repo, or better yet it >>> would >>>>>> live in a single place in the repo, but some tooling (TBD) will >>>>>> automatically handle the duplication, either at commit time, or my >>>>>> preference, at release time. >>>>>> >>>>>> For this kind of shared “utility” code I am very strongly leaning >>>> towards >>>>>> option 2 with automation, as otherwise I think the backwards >>>> compatibility >>>>>> requirements would make it unworkable (very quickly over time the >>>>>> combinations we would have to test would just be unreasonable) and I >>>> don’t >>>>>> feel confident we can have things as stable as we need to really >>> deliver >>>>>> the version separation/independency I want to delivery with AIP-72. >>>>>> >>>>>> So unless someone feels very strongly about this, I will come up >> with >>> a >>>>>> draft PR for further discussion that will implement code sharing via >>>>>> “vendoring” it at build time. I have an idea of how I can achieve >> this >>>> so >>>>>> we have a single version in the repo and it’ll work there, but at >>>> runtime >>>>>> we vendor it in to the shipped dist so it lives at something like >>>>>> `airflow.sdk._vendor` etc. >>>>>> >>>>>> In terms of repo layout, this likely means we would end up with: >>>>>> >>>>>> airflow-core/pyproject.toml >>>>>> airflow-core/src/ >>>>>> airflow-core/tests/ >>>>>> task-sdk/pyproject.toml >>>>>> task-sdk/src/ >>>>>> task-sdk/tests/ >>>>>> airflow-common/src >>>>>> airflow-common/tests/ >>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be >> included >>>> in >>>>>> the downstream projects. TBD. >>>>>> >>>>>> Thoughts and feedback welcomed. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>>> >>> >>