Yeah we have to try it and test - also building packages happens semi
frequently when you run `uv sync` (they use some kind of heuristics to
decide when) and you can force it with `--reinstall` or `--refresh`.
Package build also happens every time when you run "ci-image build` now in
breeze so it seems like it will nicely integrate in our workflows.

Looks really cool Ash.

On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> It’s not just release time, but any time we build a package which happens
> on “every” CI run. The normal unit tests will use code from
> airflow-common/src/airflow_common; the kube tests which build an image will
> build the dists and vendor in the code from that commit.
>
> There is only a single copy of the shared code committed to the repo, so
> there is never anything to synchronise.
>
> > On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com> wrote:
> >
> > Thanks Ash.
> >
> > This is really cool and helpful that you were able to test both scenarios
> > -- repo checkout
> > and also installing from the vendored package and the resolution worked
> > fine too.
> >
> > I like this idea compared the to relative import one for few reasons:
> > - It feels like it will take some time to adjust to the new coding
> standard
> > that we will lay
> > if we impose relative imports in the shared dist
> > - We can continue using repo wise absolute import standards, it is also
> > much easier for situations
> > when we do global search in IDE to find + replace, this could mean a
> change
> > there
> > - The vendoring work is a proven and established paradigm across projects
> > and would
> > out of box give us the build tooling we need also
> >
> > Nothing too against the relative import but with the evidence provided
> > above, vendored approach
> > seems to only do us good.
> >
> > Regarding synchronizing it, release time should be fine as long as we
> have
> > a good CI workflow to probably
> > catch such issues per PR if changes are made in shared dist? (precommit
> > would make it really slow i guess)
> >
> > If we can run our tests with vendored code we should be mostly covered.
> >
> > Good effort all!
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> >> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>
> >> Okay, I think I’ve got something that works and I’m happy with.
> >>
> >>
> >>
> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
> >>
> >> This produces the following from `uv build task-sdk`
> >> -
> >>
> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
> >> -
> >>
> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
> >>
> >> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
> >>
> >> ```
> >> ❯ unzip -l dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
> grep
> >> _vendor
> >>       50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
> >>     2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
> >>       28  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common.pyi
> >>       18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
> >>      785  02-02-2020 00:00
> >> airflow/sdk/_vendor/airflow_common/__init__.py
> >>    10628  02-02-2020 00:00
> >> airflow/sdk/_vendor/airflow_common/timezone.py
> >> ```
> >>
> >> And similarly in the .tar.gz, so our “sdist” is complete too:
> >> ```
> >> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor
> >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
> >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
> >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
> >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
> >>
> >>
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
> >>
> >>
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
> >> ```
> >>
> >> The plugin works at build time by including/copying the libs specified
> in
> >> vendor.txt into place (and let `vendoring` take care of import
> rewrites.)
> >>
> >> For the imports to continue to work at “dev” time/from a repo checkout,
> I
> >> have added a import finder to `sys.meta_path`, and since it’s at the
> end of
> >> the list it will only be used if the normal import can’t find things.
> >>
> >>
> >>
> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
> >>
> >> This doesn’t quite give us the same runtime effect “import rewriting”
> >> affect, as in this approach `airflow_common` is directly loaded (i.e.
> >> airflow.sdk._vendor.airflow_common and airflow_common exist in
> >> sys.modules), but it does work for everything that I was able to test..
> >>
> >> I tested it with the diff at the end of this message. My test ipython
> >> shell:
> >>
> >> ```
> >> In [1]: from airflow.sdk._vendor.airflow_common.timezone import foo
> >>
> >> In [2]: foo
> >> Out[2]: 1
> >>
> >> In [3]: import airflow.sdk._vendor.airflow_common
> >>
> >> In [4]: import airflow.sdk._vendor.airflow_common.timezone
> >>
> >> In [5]: airflow.sdk._vendor.airflow_common.__file__
> >> Out[5]:
> >>
> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
> >>
> >> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
> >> Out[6]:
> >>
> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
> >>
> >> ```
> >>
> >>
> >> And in an standalone environment with the SDK dist I built (it needed
> the
> >> matching airflow-core right now, but that is nothing to do with this
> >> discussion):
> >>
> >> ```
> >> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
> >> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
> >> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
> >> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
> >> Type 'copyright', 'credits' or 'license' for more information
> >> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help.
> >> Tip: You can use `%hist` to view history, see the options with
> `%history?`
> >>
> >> In [1]: import airflow.sdk._vendor.airflow_common.timezone
> >>
> >> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
> >> Out[2]:
> >>
> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
> >> ``
> >>
> >>
> >>
> >> ```diff
> >> diff --git a/airflow-common/src/airflow_common/__init__.py
> >> b/airflow-common/src/airflow_common/__init__.py
> >> index 13a83393a9..927b7c6b61 100644
> >> --- a/airflow-common/src/airflow_common/__init__.py
> >> +++ b/airflow-common/src/airflow_common/__init__.py
> >> @@ -14,3 +14,5 @@
> >> # KIND, either express or implied.  See the License for the
> >> # specific language governing permissions and limitations
> >> # under the License.
> >> +
> >> +foo = 1
> >> diff --git a/airflow-common/src/airflow_common/timezone.py
> >> b/airflow-common/src/airflow_common/timezone.py
> >> index 340b924c66..58384ef20f 100644
> >> --- a/airflow-common/src/airflow_common/timezone.py
> >> +++ b/airflow-common/src/airflow_common/timezone.py
> >> @@ -36,6 +36,9 @@ _PENDULUM3 =
> >> version.parse(metadata.version("pendulum")).major == 3
> >> # - FixedTimezone(0, "UTC") in pendulum 2
> >> utc = pendulum.UTC
> >>
> >> +
> >> +from airflow_common import foo
> >> +
> >> TIMEZONE: Timezone
> >>
> >>
> >> ```
> >>
> >>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>> I think both approaches are doable:
> >>>
> >>> 1) -> We can very easily prevent bad imports by pre-commit when
> importing
> >>> from different distributions and make sure we are only doing relative
> >>> imports in the shared modules. We are doing plenty of this already. And
> >> yes
> >>> it would require relative links we currently do not allow.
> >>>
> >>> 2) -> has one disadvantage that someone at some point in time will have
> >> to
> >>> decide to synchronize this and if it happens just before release (I bet
> >>> this is going to happen) this will lead to solving problems that would
> >>> normally be solved during PR when you make a change (i.e. symbolic link
> >> has
> >>> the advantage that whoever modifies shared code will be immediately
> >>> notified in their PR - that they broke something because either static
> >>> checks or mypy or tests fail.
> >>>
> >>> Ash, do you have an idea of a process (who and when) does the
> >>> synchronisation in case of vendoring? Maybe we could solve it if it is
> >> done
> >>> more frequently and with some regularity? We could potentially force
> >>> re-vendoring at PR time as well any time shared code changes (and
> prevent
> >>> it by pre-commit. And I can't think of some place (other than releases)
> >> in
> >>> our development workflow and that seems to be a bit too late as puts an
> >>> extra effort on fixing potential incompatibilities introduced on
> release
> >>> manager and delays the release. WDYT?
> >>>
> >>> Re: relative links. I think for a shared library we could potentially
> >> relax
> >>> this and allow them (and actually disallow absolute links in the pieces
> >> of
> >>> code that are shared - again, by pre-commit). As I recall, the only
> >> reason
> >>> we forbade the relative links is because of how we are (or maybe were)
> >>> doing DAG parsing and failures resulting from it. So we decided to just
> >> not
> >>> allow it to keep consistency. The way how Dag parsing works is that
> when
> >>> you are using importlib to read the Dag from a file, the relative
> imports
> >>> do not work as it does not know what they should be relative to. But if
> >>> relative import is done from an imported package, it should be no
> >> problem,
> >>> I think - otherwise our Dags would not be able to import any library
> that
> >>> uses relative imports.
> >>>
> >>> Of course consistency might be the reason why we do not want to
> introduce
> >>> relative imports. I don't see it as an issue if it is guarded by
> >> pre-commit
> >>> though.
> >>>
> >>> J.
> >>>
> >>>
> >>> J.
> >>>
> >>>
> >>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <a...@apache.org>
> >>> napisał:
> >>>
> >>>> Oh yes, symlinks will work, with one big caveat: It does mean you
> can’t
> >>>> use absolute imports in one common module to another.
> >>>>
> >>>> For example
> >>>>
> >>
> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
> >>>> where we have
> >>>>
> >>>> ```
> >>>> from airflow.utils.module_loading import import_string
> >>>> ```
> >>>>
> >>>> if we want to move serve_logs into this common lib that is then
> >> symlinked
> >>>> then we wouldn’t be able to have `from airflow_common.module_loading
> >> import
> >>>> import_string`.
> >>>>
> >>>> I can think of two possible solutions here.
> >>>>
> >>>> 1) is to allow/require relative imports in this shared lib, so `from
> >>>> .module_loading import import_string`
> >>>> 2) is to use `vendoring`[1] (from the pip maintainers) which will
> handle
> >>>> import-rewriting for us.
> >>>>
> >>>> I’d entirely forgot that symlinks in repos was a thing, so I prepared
> a
> >>>> minimal POC/demo of what vendoring approach could look like here
> >>>>
> >>
> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
> >>>>
> >>>> Now personally I am more than happy with relative imports, but
> generally
> >>>> as a project we have avoided them, so I think that limits what we
> could
> >> do
> >>>> with a symlink based approach.
> >>>>
> >>>> -ash
> >>>>
> >>>> [1] https://github.com/pradyunsg/vendoring
> >>>>
> >>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
> gopidesupa...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Thanks Ash
> >>>>>
> >>>>> Yes agree option 2 would be preferred for me. Making sure we have all
> >> the
> >>>>> gaurdriles to protect any unwanted behaviour in code sharing and
> >>>> executing
> >>>>> right of tests between the packages.
> >>>>>
> >>>>> Agree with others, option 2 would be
> >>>>>
> >>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
> amoghdesai....@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks for starting this discussion, Ash.
> >>>>>>
> >>>>>> I would prefer option 2 here with proper tooling to handle the code
> >>>>>> duplication at *release* time.
> >>>>>> It is best to have a dist that has all it needs in itself.
> >>>>>>
> >>>>>> Option 1 could very quickly get out of hand and if we decide to
> >> separate
> >>>>>> triggerer /
> >>>>>> dag processor / config etc etc as separate packages, back compat is
> >>>> going
> >>>>>> to be a nightmare
> >>>>>> and will bite us harder than we anticipate.
> >>>>>>
> >>>>>> Thanks & Regards,
> >>>>>> Amogh Desai
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> I prefer Option 2 as well to avoid matrix of dependencies
> >>>>>>>
> >>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
> >> <j_scheff...@gmx.de.invalid
> >>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'd also rather prefer option 2 - reason here is it is rather
> >>>> pragmatic
> >>>>>>>> and we no not need to cut another package and have less package
> >> counts
> >>>>>>>> and dependencies.
> >>>>>>>>
> >>>>>>>> I remember some time ago I was checking (together with Jarek, I am
> >> not
> >>>>>>>> sure anymore...) if the usage of symlinks would be possible. To
> keep
> >>>>>> the
> >>>>>>>> source in one package but "symlink" it into another. If then at
> >> point
> >>>>>> of
> >>>>>>>> packaging/release the files are materialized we have 1 set of
> code.
> >>>>>>>>
> >>>>>>>> Otherwise if not possible still the redundancy could be solved by
> a
> >>>>>>>> pre-commit hook - and in Git the files are de-duplicated anyway
> >> based
> >>>>>> on
> >>>>>>>> content hash, so this does not hurt.
> >>>>>>>>
> >>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
> >>>>>>>>> I support option 2 with proper automation & CI - the reasonings
> >>>>>> you've
> >>>>>>>>> shown for that make sense to me.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Shahar
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello everyone,
> >>>>>>>>>>
> >>>>>>>>>> As we work on finishing off the code-level separation of Task
> SDK
> >>>>>> and
> >>>>>>>> Core
> >>>>>>>>>> (scheduler etc) we have come across some situations where we
> would
> >>>>>>> like
> >>>>>>>> to
> >>>>>>>>>> share code between these.
> >>>>>>>>>>
> >>>>>>>>>> However it’s not as straight forward of “just put it in a common
> >>>>>> dist
> >>>>>>>> they
> >>>>>>>>>> both depend upon” because one of the goals of the Task SDK
> >>>>>> separation
> >>>>>>>> was
> >>>>>>>>>> to have 100% complete version independence between the two,
> >> ideally
> >>>>>>>> even if
> >>>>>>>>>> they are built into the same image and venv. Most of the reason
> >> why
> >>>>>>> this
> >>>>>>>>>> isn’t straight forward comes down to backwards compatibility -
> if
> >> we
> >>>>>>>> make
> >>>>>>>>>> an change to the common/shared distribution
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> We’ve listed the options we have thought about in
> >>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that covers
> >>>>>> some
> >>>>>>>> more
> >>>>>>>>>> things that I don’t want to get in to in this discussion such as
> >>>>>>>> possibly
> >>>>>>>>>> separating operators and executors out of a single provider
> dist.)
> >>>>>>>>>>
> >>>>>>>>>> To give a concrete example of some code I would like to share
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
> >>>>>>>>>> — logging config. Another thing we will want to share will be
> the
> >>>>>>>>>> AirflowConfigParser class from airflow.configuration (but
> notably:
> >>>>>>> only
> >>>>>>>> the
> >>>>>>>>>> parser class, _not_ the default config values, again, lets not
> >> dwell
> >>>>>>> on
> >>>>>>>> the
> >>>>>>>>>> specifics of that)
> >>>>>>>>>>
> >>>>>>>>>> So to bring the options listed in the issue here for discussion,
> >>>>>>> broadly
> >>>>>>>>>> speaking there are two high-level approaches:
> >>>>>>>>>>
> >>>>>>>>>> 1. A single shared distribution
> >>>>>>>>>> 2. No shared package and copy/duplicate code
> >>>>>>>>>>
> >>>>>>>>>> The advantage of Approach 1 is that we only have the code in one
> >>>>>>> place.
> >>>>>>>>>> However for me, at least in this specific case of Logging config
> >> or
> >>>>>>>>>> AirflowConfigParser class is that backwards compatibility is
> much
> >>>>>> much
> >>>>>>>>>> harder.
> >>>>>>>>>>
> >>>>>>>>>> The main advantage of Approach 2 is the the code is released
> >>>>>>>> with/embedded
> >>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the
> right
> >>>>>>>> version
> >>>>>>>>>> of the logging config and ConfigParser etc). The downside is
> that
> >>>>>>> either
> >>>>>>>>>> the code will need to be duplicated in the repo, or better yet
> it
> >>>>>>> would
> >>>>>>>>>> live in a single place in the repo, but some tooling (TBD) will
> >>>>>>>>>> automatically handle the duplication, either at commit time, or
> my
> >>>>>>>>>> preference, at release time.
> >>>>>>>>>>
> >>>>>>>>>> For this kind of shared “utility” code I am very strongly
> leaning
> >>>>>>>> towards
> >>>>>>>>>> option 2 with automation, as otherwise I think the backwards
> >>>>>>>> compatibility
> >>>>>>>>>> requirements would make it unworkable (very quickly over time
> the
> >>>>>>>>>> combinations we would have to test would just be unreasonable)
> >> and I
> >>>>>>>> don’t
> >>>>>>>>>> feel confident we can have things as stable as we need to really
> >>>>>>> deliver
> >>>>>>>>>> the version separation/independency I want to delivery with
> >> AIP-72.
> >>>>>>>>>>
> >>>>>>>>>> So unless someone feels very strongly about this, I will come up
> >>>>>> with
> >>>>>>> a
> >>>>>>>>>> draft PR for further discussion that will implement code sharing
> >> via
> >>>>>>>>>> “vendoring” it at build time. I have an idea of how I can
> achieve
> >>>>>> this
> >>>>>>>> so
> >>>>>>>>>> we have a single version in the repo and it’ll work there, but
> at
> >>>>>>>> runtime
> >>>>>>>>>> we vendor it in to the shipped dist so it lives at something
> like
> >>>>>>>>>> `airflow.sdk._vendor` etc.
> >>>>>>>>>>
> >>>>>>>>>> In terms of repo layout, this likely means we would end up with:
> >>>>>>>>>>
> >>>>>>>>>> airflow-core/pyproject.toml
> >>>>>>>>>> airflow-core/src/
> >>>>>>>>>> airflow-core/tests/
> >>>>>>>>>> task-sdk/pyproject.toml
> >>>>>>>>>> task-sdk/src/
> >>>>>>>>>> task-sdk/tests/
> >>>>>>>>>> airflow-common/src
> >>>>>>>>>> airflow-common/tests/
> >>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be
> >>>>>> included
> >>>>>>>> in
> >>>>>>>>>> the downstream projects. TBD.
> >>>>>>>>>>
> >>>>>>>>>> Thoughts and feedback welcomed.
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to