Okay, I think I’ve got something that works and I’m happy with. 

https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core

This produces the following from `uv build task-sdk`
- 
https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
- 
https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip

(`.whl.zip` as GH won't allow .whl upload, but will .zip)

```
❯ unzip -l dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | grep 
_vendor
       50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
     2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
       28  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common.pyi
       18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
      785  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common/__init__.py
    10628  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common/timezone.py
```

And similarly in the .tar.gz, so our “sdist” is complete too:
```
❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
```

The plugin works at build time by including/copying the libs specified in 
vendor.txt into place (and let `vendoring` take care of import rewrites.)

For the imports to continue to work at “dev” time/from a repo checkout, I have 
added a import finder to `sys.meta_path`, and since it’s at the end of the list 
it will only be used if the normal import can’t find things.

https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
 

This doesn’t quite give us the same runtime effect “import rewriting” affect, 
as in this approach `airflow_common` is directly loaded (i.e. 
airflow.sdk._vendor.airflow_common and airflow_common exist in sys.modules), 
but it does work for everything that I was able to test..

I tested it with the diff at the end of this message. My test ipython shell:

```
In [1]: from airflow.sdk._vendor.airflow_common.timezone import foo

In [2]: foo
Out[2]: 1

In [3]: import airflow.sdk._vendor.airflow_common

In [4]: import airflow.sdk._vendor.airflow_common.timezone

In [5]: airflow.sdk._vendor.airflow_common.__file__
Out[5]: 
'/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'

In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
Out[6]: 
'/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'

```


And in an standalone environment with the SDK dist I built (it needed the 
matching airflow-core right now, but that is nothing to do with this 
discussion):

```
❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with 
dist/apache_airflow_core-3.1.0-py3-none-any.whl --with 
dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: You can use `%hist` to view history, see the options with `%history?`

In [1]: import airflow.sdk._vendor.airflow_common.timezone

In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
Out[2]: 
'/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
``



```diff
diff --git a/airflow-common/src/airflow_common/__init__.py 
b/airflow-common/src/airflow_common/__init__.py
index 13a83393a9..927b7c6b61 100644
--- a/airflow-common/src/airflow_common/__init__.py
+++ b/airflow-common/src/airflow_common/__init__.py
@@ -14,3 +14,5 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
+foo = 1
diff --git a/airflow-common/src/airflow_common/timezone.py 
b/airflow-common/src/airflow_common/timezone.py
index 340b924c66..58384ef20f 100644
--- a/airflow-common/src/airflow_common/timezone.py
+++ b/airflow-common/src/airflow_common/timezone.py
@@ -36,6 +36,9 @@ _PENDULUM3 = 
version.parse(metadata.version("pendulum")).major == 3
 # - FixedTimezone(0, "UTC") in pendulum 2
 utc = pendulum.UTC
 
+
+from airflow_common import foo
+
 TIMEZONE: Timezone
 
 
```

> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> I think both approaches are doable:
> 
> 1) -> We can very easily prevent bad imports by pre-commit when importing
> from different distributions and make sure we are only doing relative
> imports in the shared modules. We are doing plenty of this already. And yes
> it would require relative links we currently do not allow.
> 
> 2) -> has one disadvantage that someone at some point in time will have to
> decide to synchronize this and if it happens just before release (I bet
> this is going to happen) this will lead to solving problems that would
> normally be solved during PR when you make a change (i.e. symbolic link has
> the advantage that whoever modifies shared code will be immediately
> notified in their PR - that they broke something because either static
> checks or mypy or tests fail.
> 
> Ash, do you have an idea of a process (who and when) does the
> synchronisation in case of vendoring? Maybe we could solve it if it is done
> more frequently and with some regularity? We could potentially force
> re-vendoring at PR time as well any time shared code changes (and prevent
> it by pre-commit. And I can't think of some place (other than releases) in
> our development workflow and that seems to be a bit too late as puts an
> extra effort on fixing potential incompatibilities introduced on release
> manager and delays the release. WDYT?
> 
> Re: relative links. I think for a shared library we could potentially relax
> this and allow them (and actually disallow absolute links in the pieces of
> code that are shared - again, by pre-commit). As I recall, the only reason
> we forbade the relative links is because of how we are (or maybe were)
> doing DAG parsing and failures resulting from it. So we decided to just not
> allow it to keep consistency. The way how Dag parsing works is that when
> you are using importlib to read the Dag from a file, the relative imports
> do not work as it does not know what they should be relative to. But if
> relative import is done from an imported package, it should be no problem,
> I think - otherwise our Dags would not be able to import any library that
> uses relative imports.
> 
> Of course consistency might be the reason why we do not want to introduce
> relative imports. I don't see it as an issue if it is guarded by pre-commit
> though.
> 
> J.
> 
> 
> J.
> 
> 
> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <a...@apache.org>
> napisał:
> 
>> Oh yes, symlinks will work, with one big caveat: It does mean you can’t
>> use absolute imports in one common module to another.
>> 
>> For example
>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>> where we have
>> 
>> ```
>> from airflow.utils.module_loading import import_string
>> ```
>> 
>> if we want to move serve_logs into this common lib that is then symlinked
>> then we wouldn’t be able to have `from airflow_common.module_loading import
>> import_string`.
>> 
>> I can think of two possible solutions here.
>> 
>> 1) is to allow/require relative imports in this shared lib, so `from
>> .module_loading import import_string`
>> 2) is to use `vendoring`[1] (from the pip maintainers) which will handle
>> import-rewriting for us.
>> 
>> I’d entirely forgot that symlinks in repos was a thing, so I prepared a
>> minimal POC/demo of what vendoring approach could look like here
>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>> 
>> Now personally I am more than happy with relative imports, but generally
>> as a project we have avoided them, so I think that limits what we could do
>> with a symlink based approach.
>> 
>> -ash
>> 
>> [1] https://github.com/pradyunsg/vendoring
>> 
>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <gopidesupa...@gmail.com>
>> wrote:
>>> 
>>> Thanks Ash
>>> 
>>> Yes agree option 2 would be preferred for me. Making sure we have all the
>>> gaurdriles to protect any unwanted behaviour in code sharing and
>> executing
>>> right of tests between the packages.
>>> 
>>> Agree with others, option 2 would be
>>> 
>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <amoghdesai....@gmail.com>
>>> wrote:
>>> 
>>>> Thanks for starting this discussion, Ash.
>>>> 
>>>> I would prefer option 2 here with proper tooling to handle the code
>>>> duplication at *release* time.
>>>> It is best to have a dist that has all it needs in itself.
>>>> 
>>>> Option 1 could very quickly get out of hand and if we decide to separate
>>>> triggerer /
>>>> dag processor / config etc etc as separate packages, back compat is
>> going
>>>> to be a nightmare
>>>> and will bite us harder than we anticipate.
>>>> 
>>>> Thanks & Regards,
>>>> Amogh Desai
>>>> 
>>>> 
>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>>> 
>>>>> I prefer Option 2 as well to avoid matrix of dependencies
>>>>> 
>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler <j_scheff...@gmx.de.invalid
>>> 
>>>>> wrote:
>>>>> 
>>>>>> I'd also rather prefer option 2 - reason here is it is rather
>> pragmatic
>>>>>> and we no not need to cut another package and have less package counts
>>>>>> and dependencies.
>>>>>> 
>>>>>> I remember some time ago I was checking (together with Jarek, I am not
>>>>>> sure anymore...) if the usage of symlinks would be possible. To keep
>>>> the
>>>>>> source in one package but "symlink" it into another. If then at point
>>>> of
>>>>>> packaging/release the files are materialized we have 1 set of code.
>>>>>> 
>>>>>> Otherwise if not possible still the redundancy could be solved by a
>>>>>> pre-commit hook - and in Git the files are de-duplicated anyway based
>>>> on
>>>>>> content hash, so this does not hurt.
>>>>>> 
>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>>>> I support option 2 with proper automation & CI - the reasonings
>>>> you've
>>>>>>> shown for that make sense to me.
>>>>>>> 
>>>>>>> 
>>>>>>> Shahar
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello everyone,
>>>>>>>> 
>>>>>>>> As we work on finishing off the code-level separation of Task SDK
>>>> and
>>>>>> Core
>>>>>>>> (scheduler etc) we have come across some situations where we would
>>>>> like
>>>>>> to
>>>>>>>> share code between these.
>>>>>>>> 
>>>>>>>> However it’s not as straight forward of “just put it in a common
>>>> dist
>>>>>> they
>>>>>>>> both depend upon” because one of the goals of the Task SDK
>>>> separation
>>>>>> was
>>>>>>>> to have 100% complete version independence between the two, ideally
>>>>>> even if
>>>>>>>> they are built into the same image and venv. Most of the reason why
>>>>> this
>>>>>>>> isn’t straight forward comes down to backwards compatibility - if we
>>>>>> make
>>>>>>>> an change to the common/shared distribution
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We’ve listed the options we have thought about in
>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that covers
>>>> some
>>>>>> more
>>>>>>>> things that I don’t want to get in to in this discussion such as
>>>>>> possibly
>>>>>>>> separating operators and executors out of a single provider dist.)
>>>>>>>> 
>>>>>>>> To give a concrete example of some code I would like to share
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>>>>> — logging config. Another thing we will want to share will be the
>>>>>>>> AirflowConfigParser class from airflow.configuration (but notably:
>>>>> only
>>>>>> the
>>>>>>>> parser class, _not_ the default config values, again, lets not dwell
>>>>> on
>>>>>> the
>>>>>>>> specifics of that)
>>>>>>>> 
>>>>>>>> So to bring the options listed in the issue here for discussion,
>>>>> broadly
>>>>>>>> speaking there are two high-level approaches:
>>>>>>>> 
>>>>>>>> 1. A single shared distribution
>>>>>>>> 2. No shared package and copy/duplicate code
>>>>>>>> 
>>>>>>>> The advantage of Approach 1 is that we only have the code in one
>>>>> place.
>>>>>>>> However for me, at least in this specific case of Logging config or
>>>>>>>> AirflowConfigParser class is that backwards compatibility is much
>>>> much
>>>>>>>> harder.
>>>>>>>> 
>>>>>>>> The main advantage of Approach 2 is the the code is released
>>>>>> with/embedded
>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the right
>>>>>> version
>>>>>>>> of the logging config and ConfigParser etc). The downside is that
>>>>> either
>>>>>>>> the code will need to be duplicated in the repo, or better yet it
>>>>> would
>>>>>>>> live in a single place in the repo, but some tooling (TBD) will
>>>>>>>> automatically handle the duplication, either at commit time, or my
>>>>>>>> preference, at release time.
>>>>>>>> 
>>>>>>>> For this kind of shared “utility” code I am very strongly leaning
>>>>>> towards
>>>>>>>> option 2 with automation, as otherwise I think the backwards
>>>>>> compatibility
>>>>>>>> requirements would make it unworkable (very quickly over time the
>>>>>>>> combinations we would have to test would just be unreasonable) and I
>>>>>> don’t
>>>>>>>> feel confident we can have things as stable as we need to really
>>>>> deliver
>>>>>>>> the version separation/independency I want to delivery with AIP-72.
>>>>>>>> 
>>>>>>>> So unless someone feels very strongly about this, I will come up
>>>> with
>>>>> a
>>>>>>>> draft PR for further discussion that will implement code sharing via
>>>>>>>> “vendoring” it at build time. I have an idea of how I can achieve
>>>> this
>>>>>> so
>>>>>>>> we have a single version in the repo and it’ll work there, but at
>>>>>> runtime
>>>>>>>> we vendor it in to the shipped dist so it lives at something like
>>>>>>>> `airflow.sdk._vendor` etc.
>>>>>>>> 
>>>>>>>> In terms of repo layout, this likely means we would end up with:
>>>>>>>> 
>>>>>>>> airflow-core/pyproject.toml
>>>>>>>> airflow-core/src/
>>>>>>>> airflow-core/tests/
>>>>>>>> task-sdk/pyproject.toml
>>>>>>>> task-sdk/src/
>>>>>>>> task-sdk/tests/
>>>>>>>> airflow-common/src
>>>>>>>> airflow-common/tests/
>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be
>>>> included
>>>>>> in
>>>>>>>> the downstream projects. TBD.
>>>>>>>> 
>>>>>>>> Thoughts and feedback welcomed.
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to