Yeah we have to try it and test - also building packages happens semi frequently when you run `uv sync` (they use some kind of heuristics to decide when) and you can force it with `--reinstall` or `--refresh`. Package build also happens every time when you run "ci-image build` now in breeze so it seems like it will nicely integrate in our workflows.
Looks really cool Ash. On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org> wrote: > It’s not just release time, but any time we build a package which happens > on “every” CI run. The normal unit tests will use code from > airflow-common/src/airflow_common; the kube tests which build an image will > build the dists and vendor in the code from that commit. > > There is only a single copy of the shared code committed to the repo, so > there is never anything to synchronise. > > > On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com> wrote: > > > > Thanks Ash. > > > > This is really cool and helpful that you were able to test both scenarios > > -- repo checkout > > and also installing from the vendored package and the resolution worked > > fine too. > > > > I like this idea compared the to relative import one for few reasons: > > - It feels like it will take some time to adjust to the new coding > standard > > that we will lay > > if we impose relative imports in the shared dist > > - We can continue using repo wise absolute import standards, it is also > > much easier for situations > > when we do global search in IDE to find + replace, this could mean a > change > > there > > - The vendoring work is a proven and established paradigm across projects > > and would > > out of box give us the build tooling we need also > > > > Nothing too against the relative import but with the evidence provided > > above, vendored approach > > seems to only do us good. > > > > Regarding synchronizing it, release time should be fine as long as we > have > > a good CI workflow to probably > > catch such issues per PR if changes are made in shared dist? (precommit > > would make it really slow i guess) > > > > If we can run our tests with vendored code we should be mostly covered. > > > > Good effort all! > > > > Thanks & Regards, > > Amogh Desai > > > > > >> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org> > wrote: > >> > >> Okay, I think I’ve got something that works and I’m happy with. > >> > >> > >> > https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core > >> > >> This produces the following from `uv build task-sdk` > >> - > >> > https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz > >> - > >> > https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip > >> > >> (`.whl.zip` as GH won't allow .whl upload, but will .zip) > >> > >> ``` > >> ❯ unzip -l dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | > grep > >> _vendor > >> 50 02-02-2020 00:00 airflow/sdk/_vendor/.gitignore > >> 2082 02-02-2020 00:00 airflow/sdk/_vendor/__init__.py > >> 28 02-02-2020 00:00 airflow/sdk/_vendor/airflow_common.pyi > >> 18 02-02-2020 00:00 airflow/sdk/_vendor/vendor.txt > >> 785 02-02-2020 00:00 > >> airflow/sdk/_vendor/airflow_common/__init__.py > >> 10628 02-02-2020 00:00 > >> airflow/sdk/_vendor/airflow_common/timezone.py > >> ``` > >> > >> And similarly in the .tar.gz, so our “sdist” is complete too: > >> ``` > >> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor > >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore > >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py > >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi > >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt > >> > >> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py > >> > >> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py > >> ``` > >> > >> The plugin works at build time by including/copying the libs specified > in > >> vendor.txt into place (and let `vendoring` take care of import > rewrites.) > >> > >> For the imports to continue to work at “dev” time/from a repo checkout, > I > >> have added a import finder to `sys.meta_path`, and since it’s at the > end of > >> the list it will only be used if the normal import can’t find things. > >> > >> > >> > https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py > >> > >> This doesn’t quite give us the same runtime effect “import rewriting” > >> affect, as in this approach `airflow_common` is directly loaded (i.e. > >> airflow.sdk._vendor.airflow_common and airflow_common exist in > >> sys.modules), but it does work for everything that I was able to test.. > >> > >> I tested it with the diff at the end of this message. My test ipython > >> shell: > >> > >> ``` > >> In [1]: from airflow.sdk._vendor.airflow_common.timezone import foo > >> > >> In [2]: foo > >> Out[2]: 1 > >> > >> In [3]: import airflow.sdk._vendor.airflow_common > >> > >> In [4]: import airflow.sdk._vendor.airflow_common.timezone > >> > >> In [5]: airflow.sdk._vendor.airflow_common.__file__ > >> Out[5]: > >> > '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' > >> > >> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__ > >> Out[6]: > >> > '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' > >> > >> ``` > >> > >> > >> And in an standalone environment with the SDK dist I built (it needed > the > >> matching airflow-core right now, but that is nothing to do with this > >> discussion): > >> > >> ``` > >> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with > >> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with > >> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython > >> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] > >> Type 'copyright', 'credits' or 'license' for more information > >> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help. > >> Tip: You can use `%hist` to view history, see the options with > `%history?` > >> > >> In [1]: import airflow.sdk._vendor.airflow_common.timezone > >> > >> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__ > >> Out[2]: > >> > '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ > >> `` > >> > >> > >> > >> ```diff > >> diff --git a/airflow-common/src/airflow_common/__init__.py > >> b/airflow-common/src/airflow_common/__init__.py > >> index 13a83393a9..927b7c6b61 100644 > >> --- a/airflow-common/src/airflow_common/__init__.py > >> +++ b/airflow-common/src/airflow_common/__init__.py > >> @@ -14,3 +14,5 @@ > >> # KIND, either express or implied. See the License for the > >> # specific language governing permissions and limitations > >> # under the License. > >> + > >> +foo = 1 > >> diff --git a/airflow-common/src/airflow_common/timezone.py > >> b/airflow-common/src/airflow_common/timezone.py > >> index 340b924c66..58384ef20f 100644 > >> --- a/airflow-common/src/airflow_common/timezone.py > >> +++ b/airflow-common/src/airflow_common/timezone.py > >> @@ -36,6 +36,9 @@ _PENDULUM3 = > >> version.parse(metadata.version("pendulum")).major == 3 > >> # - FixedTimezone(0, "UTC") in pendulum 2 > >> utc = pendulum.UTC > >> > >> + > >> +from airflow_common import foo > >> + > >> TIMEZONE: Timezone > >> > >> > >> ``` > >> > >>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote: > >>> > >>> I think both approaches are doable: > >>> > >>> 1) -> We can very easily prevent bad imports by pre-commit when > importing > >>> from different distributions and make sure we are only doing relative > >>> imports in the shared modules. We are doing plenty of this already. And > >> yes > >>> it would require relative links we currently do not allow. > >>> > >>> 2) -> has one disadvantage that someone at some point in time will have > >> to > >>> decide to synchronize this and if it happens just before release (I bet > >>> this is going to happen) this will lead to solving problems that would > >>> normally be solved during PR when you make a change (i.e. symbolic link > >> has > >>> the advantage that whoever modifies shared code will be immediately > >>> notified in their PR - that they broke something because either static > >>> checks or mypy or tests fail. > >>> > >>> Ash, do you have an idea of a process (who and when) does the > >>> synchronisation in case of vendoring? Maybe we could solve it if it is > >> done > >>> more frequently and with some regularity? We could potentially force > >>> re-vendoring at PR time as well any time shared code changes (and > prevent > >>> it by pre-commit. And I can't think of some place (other than releases) > >> in > >>> our development workflow and that seems to be a bit too late as puts an > >>> extra effort on fixing potential incompatibilities introduced on > release > >>> manager and delays the release. WDYT? > >>> > >>> Re: relative links. I think for a shared library we could potentially > >> relax > >>> this and allow them (and actually disallow absolute links in the pieces > >> of > >>> code that are shared - again, by pre-commit). As I recall, the only > >> reason > >>> we forbade the relative links is because of how we are (or maybe were) > >>> doing DAG parsing and failures resulting from it. So we decided to just > >> not > >>> allow it to keep consistency. The way how Dag parsing works is that > when > >>> you are using importlib to read the Dag from a file, the relative > imports > >>> do not work as it does not know what they should be relative to. But if > >>> relative import is done from an imported package, it should be no > >> problem, > >>> I think - otherwise our Dags would not be able to import any library > that > >>> uses relative imports. > >>> > >>> Of course consistency might be the reason why we do not want to > introduce > >>> relative imports. I don't see it as an issue if it is guarded by > >> pre-commit > >>> though. > >>> > >>> J. > >>> > >>> > >>> J. > >>> > >>> > >>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <a...@apache.org> > >>> napisał: > >>> > >>>> Oh yes, symlinks will work, with one big caveat: It does mean you > can’t > >>>> use absolute imports in one common module to another. > >>>> > >>>> For example > >>>> > >> > https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 > >>>> where we have > >>>> > >>>> ``` > >>>> from airflow.utils.module_loading import import_string > >>>> ``` > >>>> > >>>> if we want to move serve_logs into this common lib that is then > >> symlinked > >>>> then we wouldn’t be able to have `from airflow_common.module_loading > >> import > >>>> import_string`. > >>>> > >>>> I can think of two possible solutions here. > >>>> > >>>> 1) is to allow/require relative imports in this shared lib, so `from > >>>> .module_loading import import_string` > >>>> 2) is to use `vendoring`[1] (from the pip maintainers) which will > handle > >>>> import-rewriting for us. > >>>> > >>>> I’d entirely forgot that symlinks in repos was a thing, so I prepared > a > >>>> minimal POC/demo of what vendoring approach could look like here > >>>> > >> > https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 > >>>> > >>>> Now personally I am more than happy with relative imports, but > generally > >>>> as a project we have avoided them, so I think that limits what we > could > >> do > >>>> with a symlink based approach. > >>>> > >>>> -ash > >>>> > >>>> [1] https://github.com/pradyunsg/vendoring > >>>> > >>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < > gopidesupa...@gmail.com> > >>>> wrote: > >>>>> > >>>>> Thanks Ash > >>>>> > >>>>> Yes agree option 2 would be preferred for me. Making sure we have all > >> the > >>>>> gaurdriles to protect any unwanted behaviour in code sharing and > >>>> executing > >>>>> right of tests between the packages. > >>>>> > >>>>> Agree with others, option 2 would be > >>>>> > >>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < > amoghdesai....@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Thanks for starting this discussion, Ash. > >>>>>> > >>>>>> I would prefer option 2 here with proper tooling to handle the code > >>>>>> duplication at *release* time. > >>>>>> It is best to have a dist that has all it needs in itself. > >>>>>> > >>>>>> Option 1 could very quickly get out of hand and if we decide to > >> separate > >>>>>> triggerer / > >>>>>> dag processor / config etc etc as separate packages, back compat is > >>>> going > >>>>>> to be a nightmare > >>>>>> and will bite us harder than we anticipate. > >>>>>> > >>>>>> Thanks & Regards, > >>>>>> Amogh Desai > >>>>>> > >>>>>> > >>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> > >> wrote: > >>>>>> > >>>>>>> I prefer Option 2 as well to avoid matrix of dependencies > >>>>>>> > >>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler > >> <j_scheff...@gmx.de.invalid > >>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> I'd also rather prefer option 2 - reason here is it is rather > >>>> pragmatic > >>>>>>>> and we no not need to cut another package and have less package > >> counts > >>>>>>>> and dependencies. > >>>>>>>> > >>>>>>>> I remember some time ago I was checking (together with Jarek, I am > >> not > >>>>>>>> sure anymore...) if the usage of symlinks would be possible. To > keep > >>>>>> the > >>>>>>>> source in one package but "symlink" it into another. If then at > >> point > >>>>>> of > >>>>>>>> packaging/release the files are materialized we have 1 set of > code. > >>>>>>>> > >>>>>>>> Otherwise if not possible still the redundancy could be solved by > a > >>>>>>>> pre-commit hook - and in Git the files are de-duplicated anyway > >> based > >>>>>> on > >>>>>>>> content hash, so this does not hurt. > >>>>>>>> > >>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: > >>>>>>>>> I support option 2 with proper automation & CI - the reasonings > >>>>>> you've > >>>>>>>>> shown for that make sense to me. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Shahar > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org > > > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hello everyone, > >>>>>>>>>> > >>>>>>>>>> As we work on finishing off the code-level separation of Task > SDK > >>>>>> and > >>>>>>>> Core > >>>>>>>>>> (scheduler etc) we have come across some situations where we > would > >>>>>>> like > >>>>>>>> to > >>>>>>>>>> share code between these. > >>>>>>>>>> > >>>>>>>>>> However it’s not as straight forward of “just put it in a common > >>>>>> dist > >>>>>>>> they > >>>>>>>>>> both depend upon” because one of the goals of the Task SDK > >>>>>> separation > >>>>>>>> was > >>>>>>>>>> to have 100% complete version independence between the two, > >> ideally > >>>>>>>> even if > >>>>>>>>>> they are built into the same image and venv. Most of the reason > >> why > >>>>>>> this > >>>>>>>>>> isn’t straight forward comes down to backwards compatibility - > if > >> we > >>>>>>>> make > >>>>>>>>>> an change to the common/shared distribution > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> We’ve listed the options we have thought about in > >>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that covers > >>>>>> some > >>>>>>>> more > >>>>>>>>>> things that I don’t want to get in to in this discussion such as > >>>>>>>> possibly > >>>>>>>>>> separating operators and executors out of a single provider > dist.) > >>>>>>>>>> > >>>>>>>>>> To give a concrete example of some code I would like to share > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py > >>>>>>>>>> — logging config. Another thing we will want to share will be > the > >>>>>>>>>> AirflowConfigParser class from airflow.configuration (but > notably: > >>>>>>> only > >>>>>>>> the > >>>>>>>>>> parser class, _not_ the default config values, again, lets not > >> dwell > >>>>>>> on > >>>>>>>> the > >>>>>>>>>> specifics of that) > >>>>>>>>>> > >>>>>>>>>> So to bring the options listed in the issue here for discussion, > >>>>>>> broadly > >>>>>>>>>> speaking there are two high-level approaches: > >>>>>>>>>> > >>>>>>>>>> 1. A single shared distribution > >>>>>>>>>> 2. No shared package and copy/duplicate code > >>>>>>>>>> > >>>>>>>>>> The advantage of Approach 1 is that we only have the code in one > >>>>>>> place. > >>>>>>>>>> However for me, at least in this specific case of Logging config > >> or > >>>>>>>>>> AirflowConfigParser class is that backwards compatibility is > much > >>>>>> much > >>>>>>>>>> harder. > >>>>>>>>>> > >>>>>>>>>> The main advantage of Approach 2 is the the code is released > >>>>>>>> with/embedded > >>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the > right > >>>>>>>> version > >>>>>>>>>> of the logging config and ConfigParser etc). The downside is > that > >>>>>>> either > >>>>>>>>>> the code will need to be duplicated in the repo, or better yet > it > >>>>>>> would > >>>>>>>>>> live in a single place in the repo, but some tooling (TBD) will > >>>>>>>>>> automatically handle the duplication, either at commit time, or > my > >>>>>>>>>> preference, at release time. > >>>>>>>>>> > >>>>>>>>>> For this kind of shared “utility” code I am very strongly > leaning > >>>>>>>> towards > >>>>>>>>>> option 2 with automation, as otherwise I think the backwards > >>>>>>>> compatibility > >>>>>>>>>> requirements would make it unworkable (very quickly over time > the > >>>>>>>>>> combinations we would have to test would just be unreasonable) > >> and I > >>>>>>>> don’t > >>>>>>>>>> feel confident we can have things as stable as we need to really > >>>>>>> deliver > >>>>>>>>>> the version separation/independency I want to delivery with > >> AIP-72. > >>>>>>>>>> > >>>>>>>>>> So unless someone feels very strongly about this, I will come up > >>>>>> with > >>>>>>> a > >>>>>>>>>> draft PR for further discussion that will implement code sharing > >> via > >>>>>>>>>> “vendoring” it at build time. I have an idea of how I can > achieve > >>>>>> this > >>>>>>>> so > >>>>>>>>>> we have a single version in the repo and it’ll work there, but > at > >>>>>>>> runtime > >>>>>>>>>> we vendor it in to the shipped dist so it lives at something > like > >>>>>>>>>> `airflow.sdk._vendor` etc. > >>>>>>>>>> > >>>>>>>>>> In terms of repo layout, this likely means we would end up with: > >>>>>>>>>> > >>>>>>>>>> airflow-core/pyproject.toml > >>>>>>>>>> airflow-core/src/ > >>>>>>>>>> airflow-core/tests/ > >>>>>>>>>> task-sdk/pyproject.toml > >>>>>>>>>> task-sdk/src/ > >>>>>>>>>> task-sdk/tests/ > >>>>>>>>>> airflow-common/src > >>>>>>>>>> airflow-common/tests/ > >>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be > >>>>>> included > >>>>>>>> in > >>>>>>>>>> the downstream projects. TBD. > >>>>>>>>>> > >>>>>>>>>> Thoughts and feedback welcomed. > >>>>>>>> > >>>>>>>> > >> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >