Oh yes, symlinks will work, with one big caveat: It does mean you can’t use 
absolute imports in one common module to another.

For example 
https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
 where we have

```
from airflow.utils.module_loading import import_string
```

 if we want to move serve_logs into this common lib that is then symlinked then 
we wouldn’t be able to have `from airflow_common.module_loading import 
import_string`.

I can think of two possible solutions here.

1) is to allow/require relative imports in this shared lib, so `from 
.module_loading import import_string`
2) is to use `vendoring`[1] (from the pip maintainers) which will handle 
import-rewriting for us.

I’d entirely forgot that symlinks in repos was a thing, so I prepared a minimal 
POC/demo of what vendoring approach could look like here 
https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3

Now personally I am more than happy with relative imports, but generally as a 
project we have avoided them, so I think that limits what we could do with a 
symlink based approach.

-ash

[1] https://github.com/pradyunsg/vendoring 

> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <gopidesupa...@gmail.com> wrote:
> 
> Thanks Ash
> 
> Yes agree option 2 would be preferred for me. Making sure we have all the
> gaurdriles to protect any unwanted behaviour in code sharing and executing
> right of tests between the packages.
> 
> Agree with others, option 2 would be
> 
> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <amoghdesai....@gmail.com>
> wrote:
> 
>> Thanks for starting this discussion, Ash.
>> 
>> I would prefer option 2 here with proper tooling to handle the code
>> duplication at *release* time.
>> It is best to have a dist that has all it needs in itself.
>> 
>> Option 1 could very quickly get out of hand and if we decide to separate
>> triggerer /
>> dag processor / config etc etc as separate packages, back compat is going
>> to be a nightmare
>> and will bite us harder than we anticipate.
>> 
>> Thanks & Regards,
>> Amogh Desai
>> 
>> 
>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>> 
>>> I prefer Option 2 as well to avoid matrix of dependencies
>>> 
>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler <j_scheff...@gmx.de.invalid>
>>> wrote:
>>> 
>>>> I'd also rather prefer option 2 - reason here is it is rather pragmatic
>>>> and we no not need to cut another package and have less package counts
>>>> and dependencies.
>>>> 
>>>> I remember some time ago I was checking (together with Jarek, I am not
>>>> sure anymore...) if the usage of symlinks would be possible. To keep
>> the
>>>> source in one package but "symlink" it into another. If then at point
>> of
>>>> packaging/release the files are materialized we have 1 set of code.
>>>> 
>>>> Otherwise if not possible still the redundancy could be solved by a
>>>> pre-commit hook - and in Git the files are de-duplicated anyway based
>> on
>>>> content hash, so this does not hurt.
>>>> 
>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>> I support option 2 with proper automation & CI - the reasonings
>> you've
>>>>> shown for that make sense to me.
>>>>> 
>>>>> 
>>>>> Shahar
>>>>> 
>>>>> 
>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org>
>>> wrote:
>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> As we work on finishing off the code-level separation of Task SDK
>> and
>>>> Core
>>>>>> (scheduler etc) we have come across some situations where we would
>>> like
>>>> to
>>>>>> share code between these.
>>>>>> 
>>>>>> However it’s not as straight forward of “just put it in a common
>> dist
>>>> they
>>>>>> both depend upon” because one of the goals of the Task SDK
>> separation
>>>> was
>>>>>> to have 100% complete version independence between the two, ideally
>>>> even if
>>>>>> they are built into the same image and venv. Most of the reason why
>>> this
>>>>>> isn’t straight forward comes down to backwards compatibility - if we
>>>> make
>>>>>> an change to the common/shared distribution
>>>>>> 
>>>>>> 
>>>>>> We’ve listed the options we have thought about in
>>>>>> https://github.com/apache/airflow/issues/51545 (but that covers
>> some
>>>> more
>>>>>> things that I don’t want to get in to in this discussion such as
>>>> possibly
>>>>>> separating operators and executors out of a single provider dist.)
>>>>>> 
>>>>>> To give a concrete example of some code I would like to share
>>>>>> 
>>>> 
>>> 
>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>>> — logging config. Another thing we will want to share will be the
>>>>>> AirflowConfigParser class from airflow.configuration (but notably:
>>> only
>>>> the
>>>>>> parser class, _not_ the default config values, again, lets not dwell
>>> on
>>>> the
>>>>>> specifics of that)
>>>>>> 
>>>>>> So to bring the options listed in the issue here for discussion,
>>> broadly
>>>>>> speaking there are two high-level approaches:
>>>>>> 
>>>>>> 1. A single shared distribution
>>>>>> 2. No shared package and copy/duplicate code
>>>>>> 
>>>>>> The advantage of Approach 1 is that we only have the code in one
>>> place.
>>>>>> However for me, at least in this specific case of Logging config or
>>>>>> AirflowConfigParser class is that backwards compatibility is much
>> much
>>>>>> harder.
>>>>>> 
>>>>>> The main advantage of Approach 2 is the the code is released
>>>> with/embedded
>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the right
>>>> version
>>>>>> of the logging config and ConfigParser etc). The downside is that
>>> either
>>>>>> the code will need to be duplicated in the repo, or better yet it
>>> would
>>>>>> live in a single place in the repo, but some tooling (TBD) will
>>>>>> automatically handle the duplication, either at commit time, or my
>>>>>> preference, at release time.
>>>>>> 
>>>>>> For this kind of shared “utility” code I am very strongly leaning
>>>> towards
>>>>>> option 2 with automation, as otherwise I think the backwards
>>>> compatibility
>>>>>> requirements would make it unworkable (very quickly over time the
>>>>>> combinations we would have to test would just be unreasonable) and I
>>>> don’t
>>>>>> feel confident we can have things as stable as we need to really
>>> deliver
>>>>>> the version separation/independency I want to delivery with AIP-72.
>>>>>> 
>>>>>> So unless someone feels very strongly about this, I will come up
>> with
>>> a
>>>>>> draft PR for further discussion that will implement code sharing via
>>>>>> “vendoring” it at build time. I have an idea of how I can achieve
>> this
>>>> so
>>>>>> we have a single version in the repo and it’ll work there, but at
>>>> runtime
>>>>>> we vendor it in to the shipped dist so it lives at something like
>>>>>> `airflow.sdk._vendor` etc.
>>>>>> 
>>>>>> In terms of repo layout, this likely means we would end up with:
>>>>>> 
>>>>>> airflow-core/pyproject.toml
>>>>>> airflow-core/src/
>>>>>> airflow-core/tests/
>>>>>> task-sdk/pyproject.toml
>>>>>> task-sdk/src/
>>>>>> task-sdk/tests/
>>>>>> airflow-common/src
>>>>>> airflow-common/tests/
>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be
>> included
>>>> in
>>>>>> the downstream projects. TBD.
>>>>>> 
>>>>>> Thoughts and feedback welcomed.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> 
>>>> 
>>> 
>> 

Reply via email to