It’s not just release time, but any time we build a package which happens on 
“every” CI run. The normal unit tests will use code from 
airflow-common/src/airflow_common; the kube tests which build an image will 
build the dists and vendor in the code from that commit. 

There is only a single copy of the shared code committed to the repo, so there 
is never anything to synchronise. 

> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com> wrote:
> 
> Thanks Ash.
> 
> This is really cool and helpful that you were able to test both scenarios
> -- repo checkout
> and also installing from the vendored package and the resolution worked
> fine too.
> 
> I like this idea compared the to relative import one for few reasons:
> - It feels like it will take some time to adjust to the new coding standard
> that we will lay
> if we impose relative imports in the shared dist
> - We can continue using repo wise absolute import standards, it is also
> much easier for situations
> when we do global search in IDE to find + replace, this could mean a change
> there
> - The vendoring work is a proven and established paradigm across projects
> and would
> out of box give us the build tooling we need also
> 
> Nothing too against the relative import but with the evidence provided
> above, vendored approach
> seems to only do us good.
> 
> Regarding synchronizing it, release time should be fine as long as we have
> a good CI workflow to probably
> catch such issues per PR if changes are made in shared dist? (precommit
> would make it really slow i guess)
> 
> If we can run our tests with vendored code we should be mostly covered.
> 
> Good effort all!
> 
> Thanks & Regards,
> Amogh Desai
> 
> 
>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>> 
>> Okay, I think I’ve got something that works and I’m happy with.
>> 
>> 
>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
>> 
>> This produces the following from `uv build task-sdk`
>> -
>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
>> -
>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
>> 
>> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
>> 
>> ```
>> ❯ unzip -l dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | grep
>> _vendor
>>       50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
>>     2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
>>       28  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common.pyi
>>       18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
>>      785  02-02-2020 00:00
>> airflow/sdk/_vendor/airflow_common/__init__.py
>>    10628  02-02-2020 00:00
>> airflow/sdk/_vendor/airflow_common/timezone.py
>> ```
>> 
>> And similarly in the .tar.gz, so our “sdist” is complete too:
>> ```
>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
>> 
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
>> 
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
>> ```
>> 
>> The plugin works at build time by including/copying the libs specified in
>> vendor.txt into place (and let `vendoring` take care of import rewrites.)
>> 
>> For the imports to continue to work at “dev” time/from a repo checkout, I
>> have added a import finder to `sys.meta_path`, and since it’s at the end of
>> the list it will only be used if the normal import can’t find things.
>> 
>> 
>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
>> 
>> This doesn’t quite give us the same runtime effect “import rewriting”
>> affect, as in this approach `airflow_common` is directly loaded (i.e.
>> airflow.sdk._vendor.airflow_common and airflow_common exist in
>> sys.modules), but it does work for everything that I was able to test..
>> 
>> I tested it with the diff at the end of this message. My test ipython
>> shell:
>> 
>> ```
>> In [1]: from airflow.sdk._vendor.airflow_common.timezone import foo
>> 
>> In [2]: foo
>> Out[2]: 1
>> 
>> In [3]: import airflow.sdk._vendor.airflow_common
>> 
>> In [4]: import airflow.sdk._vendor.airflow_common.timezone
>> 
>> In [5]: airflow.sdk._vendor.airflow_common.__file__
>> Out[5]:
>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
>> 
>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
>> Out[6]:
>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
>> 
>> ```
>> 
>> 
>> And in an standalone environment with the SDK dist I built (it needed the
>> matching airflow-core right now, but that is nothing to do with this
>> discussion):
>> 
>> ```
>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
>> Type 'copyright', 'credits' or 'license' for more information
>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help.
>> Tip: You can use `%hist` to view history, see the options with `%history?`
>> 
>> In [1]: import airflow.sdk._vendor.airflow_common.timezone
>> 
>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
>> Out[2]:
>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
>> ``
>> 
>> 
>> 
>> ```diff
>> diff --git a/airflow-common/src/airflow_common/__init__.py
>> b/airflow-common/src/airflow_common/__init__.py
>> index 13a83393a9..927b7c6b61 100644
>> --- a/airflow-common/src/airflow_common/__init__.py
>> +++ b/airflow-common/src/airflow_common/__init__.py
>> @@ -14,3 +14,5 @@
>> # KIND, either express or implied.  See the License for the
>> # specific language governing permissions and limitations
>> # under the License.
>> +
>> +foo = 1
>> diff --git a/airflow-common/src/airflow_common/timezone.py
>> b/airflow-common/src/airflow_common/timezone.py
>> index 340b924c66..58384ef20f 100644
>> --- a/airflow-common/src/airflow_common/timezone.py
>> +++ b/airflow-common/src/airflow_common/timezone.py
>> @@ -36,6 +36,9 @@ _PENDULUM3 =
>> version.parse(metadata.version("pendulum")).major == 3
>> # - FixedTimezone(0, "UTC") in pendulum 2
>> utc = pendulum.UTC
>> 
>> +
>> +from airflow_common import foo
>> +
>> TIMEZONE: Timezone
>> 
>> 
>> ```
>> 
>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote:
>>> 
>>> I think both approaches are doable:
>>> 
>>> 1) -> We can very easily prevent bad imports by pre-commit when importing
>>> from different distributions and make sure we are only doing relative
>>> imports in the shared modules. We are doing plenty of this already. And
>> yes
>>> it would require relative links we currently do not allow.
>>> 
>>> 2) -> has one disadvantage that someone at some point in time will have
>> to
>>> decide to synchronize this and if it happens just before release (I bet
>>> this is going to happen) this will lead to solving problems that would
>>> normally be solved during PR when you make a change (i.e. symbolic link
>> has
>>> the advantage that whoever modifies shared code will be immediately
>>> notified in their PR - that they broke something because either static
>>> checks or mypy or tests fail.
>>> 
>>> Ash, do you have an idea of a process (who and when) does the
>>> synchronisation in case of vendoring? Maybe we could solve it if it is
>> done
>>> more frequently and with some regularity? We could potentially force
>>> re-vendoring at PR time as well any time shared code changes (and prevent
>>> it by pre-commit. And I can't think of some place (other than releases)
>> in
>>> our development workflow and that seems to be a bit too late as puts an
>>> extra effort on fixing potential incompatibilities introduced on release
>>> manager and delays the release. WDYT?
>>> 
>>> Re: relative links. I think for a shared library we could potentially
>> relax
>>> this and allow them (and actually disallow absolute links in the pieces
>> of
>>> code that are shared - again, by pre-commit). As I recall, the only
>> reason
>>> we forbade the relative links is because of how we are (or maybe were)
>>> doing DAG parsing and failures resulting from it. So we decided to just
>> not
>>> allow it to keep consistency. The way how Dag parsing works is that when
>>> you are using importlib to read the Dag from a file, the relative imports
>>> do not work as it does not know what they should be relative to. But if
>>> relative import is done from an imported package, it should be no
>> problem,
>>> I think - otherwise our Dags would not be able to import any library that
>>> uses relative imports.
>>> 
>>> Of course consistency might be the reason why we do not want to introduce
>>> relative imports. I don't see it as an issue if it is guarded by
>> pre-commit
>>> though.
>>> 
>>> J.
>>> 
>>> 
>>> J.
>>> 
>>> 
>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <a...@apache.org>
>>> napisał:
>>> 
>>>> Oh yes, symlinks will work, with one big caveat: It does mean you can’t
>>>> use absolute imports in one common module to another.
>>>> 
>>>> For example
>>>> 
>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>>>> where we have
>>>> 
>>>> ```
>>>> from airflow.utils.module_loading import import_string
>>>> ```
>>>> 
>>>> if we want to move serve_logs into this common lib that is then
>> symlinked
>>>> then we wouldn’t be able to have `from airflow_common.module_loading
>> import
>>>> import_string`.
>>>> 
>>>> I can think of two possible solutions here.
>>>> 
>>>> 1) is to allow/require relative imports in this shared lib, so `from
>>>> .module_loading import import_string`
>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which will handle
>>>> import-rewriting for us.
>>>> 
>>>> I’d entirely forgot that symlinks in repos was a thing, so I prepared a
>>>> minimal POC/demo of what vendoring approach could look like here
>>>> 
>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>>>> 
>>>> Now personally I am more than happy with relative imports, but generally
>>>> as a project we have avoided them, so I think that limits what we could
>> do
>>>> with a symlink based approach.
>>>> 
>>>> -ash
>>>> 
>>>> [1] https://github.com/pradyunsg/vendoring
>>>> 
>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <gopidesupa...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Thanks Ash
>>>>> 
>>>>> Yes agree option 2 would be preferred for me. Making sure we have all
>> the
>>>>> gaurdriles to protect any unwanted behaviour in code sharing and
>>>> executing
>>>>> right of tests between the packages.
>>>>> 
>>>>> Agree with others, option 2 would be
>>>>> 
>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <amoghdesai....@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Thanks for starting this discussion, Ash.
>>>>>> 
>>>>>> I would prefer option 2 here with proper tooling to handle the code
>>>>>> duplication at *release* time.
>>>>>> It is best to have a dist that has all it needs in itself.
>>>>>> 
>>>>>> Option 1 could very quickly get out of hand and if we decide to
>> separate
>>>>>> triggerer /
>>>>>> dag processor / config etc etc as separate packages, back compat is
>>>> going
>>>>>> to be a nightmare
>>>>>> and will bite us harder than we anticipate.
>>>>>> 
>>>>>> Thanks & Regards,
>>>>>> Amogh Desai
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies
>>>>>>> 
>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
>> <j_scheff...@gmx.de.invalid
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I'd also rather prefer option 2 - reason here is it is rather
>>>> pragmatic
>>>>>>>> and we no not need to cut another package and have less package
>> counts
>>>>>>>> and dependencies.
>>>>>>>> 
>>>>>>>> I remember some time ago I was checking (together with Jarek, I am
>> not
>>>>>>>> sure anymore...) if the usage of symlinks would be possible. To keep
>>>>>> the
>>>>>>>> source in one package but "symlink" it into another. If then at
>> point
>>>>>> of
>>>>>>>> packaging/release the files are materialized we have 1 set of code.
>>>>>>>> 
>>>>>>>> Otherwise if not possible still the redundancy could be solved by a
>>>>>>>> pre-commit hook - and in Git the files are de-duplicated anyway
>> based
>>>>>> on
>>>>>>>> content hash, so this does not hurt.
>>>>>>>> 
>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>>>>>> I support option 2 with proper automation & CI - the reasonings
>>>>>> you've
>>>>>>>>> shown for that make sense to me.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Shahar
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello everyone,
>>>>>>>>>> 
>>>>>>>>>> As we work on finishing off the code-level separation of Task SDK
>>>>>> and
>>>>>>>> Core
>>>>>>>>>> (scheduler etc) we have come across some situations where we would
>>>>>>> like
>>>>>>>> to
>>>>>>>>>> share code between these.
>>>>>>>>>> 
>>>>>>>>>> However it’s not as straight forward of “just put it in a common
>>>>>> dist
>>>>>>>> they
>>>>>>>>>> both depend upon” because one of the goals of the Task SDK
>>>>>> separation
>>>>>>>> was
>>>>>>>>>> to have 100% complete version independence between the two,
>> ideally
>>>>>>>> even if
>>>>>>>>>> they are built into the same image and venv. Most of the reason
>> why
>>>>>>> this
>>>>>>>>>> isn’t straight forward comes down to backwards compatibility - if
>> we
>>>>>>>> make
>>>>>>>>>> an change to the common/shared distribution
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We’ve listed the options we have thought about in
>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that covers
>>>>>> some
>>>>>>>> more
>>>>>>>>>> things that I don’t want to get in to in this discussion such as
>>>>>>>> possibly
>>>>>>>>>> separating operators and executors out of a single provider dist.)
>>>>>>>>>> 
>>>>>>>>>> To give a concrete example of some code I would like to share
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>>>>>>> — logging config. Another thing we will want to share will be the
>>>>>>>>>> AirflowConfigParser class from airflow.configuration (but notably:
>>>>>>> only
>>>>>>>> the
>>>>>>>>>> parser class, _not_ the default config values, again, lets not
>> dwell
>>>>>>> on
>>>>>>>> the
>>>>>>>>>> specifics of that)
>>>>>>>>>> 
>>>>>>>>>> So to bring the options listed in the issue here for discussion,
>>>>>>> broadly
>>>>>>>>>> speaking there are two high-level approaches:
>>>>>>>>>> 
>>>>>>>>>> 1. A single shared distribution
>>>>>>>>>> 2. No shared package and copy/duplicate code
>>>>>>>>>> 
>>>>>>>>>> The advantage of Approach 1 is that we only have the code in one
>>>>>>> place.
>>>>>>>>>> However for me, at least in this specific case of Logging config
>> or
>>>>>>>>>> AirflowConfigParser class is that backwards compatibility is much
>>>>>> much
>>>>>>>>>> harder.
>>>>>>>>>> 
>>>>>>>>>> The main advantage of Approach 2 is the the code is released
>>>>>>>> with/embedded
>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the right
>>>>>>>> version
>>>>>>>>>> of the logging config and ConfigParser etc). The downside is that
>>>>>>> either
>>>>>>>>>> the code will need to be duplicated in the repo, or better yet it
>>>>>>> would
>>>>>>>>>> live in a single place in the repo, but some tooling (TBD) will
>>>>>>>>>> automatically handle the duplication, either at commit time, or my
>>>>>>>>>> preference, at release time.
>>>>>>>>>> 
>>>>>>>>>> For this kind of shared “utility” code I am very strongly leaning
>>>>>>>> towards
>>>>>>>>>> option 2 with automation, as otherwise I think the backwards
>>>>>>>> compatibility
>>>>>>>>>> requirements would make it unworkable (very quickly over time the
>>>>>>>>>> combinations we would have to test would just be unreasonable)
>> and I
>>>>>>>> don’t
>>>>>>>>>> feel confident we can have things as stable as we need to really
>>>>>>> deliver
>>>>>>>>>> the version separation/independency I want to delivery with
>> AIP-72.
>>>>>>>>>> 
>>>>>>>>>> So unless someone feels very strongly about this, I will come up
>>>>>> with
>>>>>>> a
>>>>>>>>>> draft PR for further discussion that will implement code sharing
>> via
>>>>>>>>>> “vendoring” it at build time. I have an idea of how I can achieve
>>>>>> this
>>>>>>>> so
>>>>>>>>>> we have a single version in the repo and it’ll work there, but at
>>>>>>>> runtime
>>>>>>>>>> we vendor it in to the shipped dist so it lives at something like
>>>>>>>>>> `airflow.sdk._vendor` etc.
>>>>>>>>>> 
>>>>>>>>>> In terms of repo layout, this likely means we would end up with:
>>>>>>>>>> 
>>>>>>>>>> airflow-core/pyproject.toml
>>>>>>>>>> airflow-core/src/
>>>>>>>>>> airflow-core/tests/
>>>>>>>>>> task-sdk/pyproject.toml
>>>>>>>>>> task-sdk/src/
>>>>>>>>>> task-sdk/tests/
>>>>>>>>>> airflow-common/src
>>>>>>>>>> airflow-common/tests/
>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be
>>>>>> included
>>>>>>>> in
>>>>>>>>>> the downstream projects. TBD.
>>>>>>>>>> 
>>>>>>>>>> Thoughts and feedback welcomed.
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to