Okay, I think I’ve got something that works and I’m happy with. https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
This produces the following from `uv build task-sdk` - https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz - https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip (`.whl.zip` as GH won't allow .whl upload, but will .zip) ``` ❯ unzip -l dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | grep _vendor 50 02-02-2020 00:00 airflow/sdk/_vendor/.gitignore 2082 02-02-2020 00:00 airflow/sdk/_vendor/__init__.py 28 02-02-2020 00:00 airflow/sdk/_vendor/airflow_common.pyi 18 02-02-2020 00:00 airflow/sdk/_vendor/vendor.txt 785 02-02-2020 00:00 airflow/sdk/_vendor/airflow_common/__init__.py 10628 02-02-2020 00:00 airflow/sdk/_vendor/airflow_common/timezone.py ``` And similarly in the .tar.gz, so our “sdist” is complete too: ``` ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py ``` The plugin works at build time by including/copying the libs specified in vendor.txt into place (and let `vendoring` take care of import rewrites.) For the imports to continue to work at “dev” time/from a repo checkout, I have added a import finder to `sys.meta_path`, and since it’s at the end of the list it will only be used if the normal import can’t find things. https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py This doesn’t quite give us the same runtime effect “import rewriting” affect, as in this approach `airflow_common` is directly loaded (i.e. airflow.sdk._vendor.airflow_common and airflow_common exist in sys.modules), but it does work for everything that I was able to test.. I tested it with the diff at the end of this message. My test ipython shell: ``` In [1]: from airflow.sdk._vendor.airflow_common.timezone import foo In [2]: foo Out[2]: 1 In [3]: import airflow.sdk._vendor.airflow_common In [4]: import airflow.sdk._vendor.airflow_common.timezone In [5]: airflow.sdk._vendor.airflow_common.__file__ Out[5]: '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__ Out[6]: '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' ``` And in an standalone environment with the SDK dist I built (it needed the matching airflow-core right now, but that is nothing to do with this discussion): ``` ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with dist/apache_airflow_core-3.1.0-py3-none-any.whl --with dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] Type 'copyright', 'credits' or 'license' for more information IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help. Tip: You can use `%hist` to view history, see the options with `%history?` In [1]: import airflow.sdk._vendor.airflow_common.timezone In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__ Out[2]: '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ `` ```diff diff --git a/airflow-common/src/airflow_common/__init__.py b/airflow-common/src/airflow_common/__init__.py index 13a83393a9..927b7c6b61 100644 --- a/airflow-common/src/airflow_common/__init__.py +++ b/airflow-common/src/airflow_common/__init__.py @@ -14,3 +14,5 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. + +foo = 1 diff --git a/airflow-common/src/airflow_common/timezone.py b/airflow-common/src/airflow_common/timezone.py index 340b924c66..58384ef20f 100644 --- a/airflow-common/src/airflow_common/timezone.py +++ b/airflow-common/src/airflow_common/timezone.py @@ -36,6 +36,9 @@ _PENDULUM3 = version.parse(metadata.version("pendulum")).major == 3 # - FixedTimezone(0, "UTC") in pendulum 2 utc = pendulum.UTC + +from airflow_common import foo + TIMEZONE: Timezone ``` > On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote: > > I think both approaches are doable: > > 1) -> We can very easily prevent bad imports by pre-commit when importing > from different distributions and make sure we are only doing relative > imports in the shared modules. We are doing plenty of this already. And yes > it would require relative links we currently do not allow. > > 2) -> has one disadvantage that someone at some point in time will have to > decide to synchronize this and if it happens just before release (I bet > this is going to happen) this will lead to solving problems that would > normally be solved during PR when you make a change (i.e. symbolic link has > the advantage that whoever modifies shared code will be immediately > notified in their PR - that they broke something because either static > checks or mypy or tests fail. > > Ash, do you have an idea of a process (who and when) does the > synchronisation in case of vendoring? Maybe we could solve it if it is done > more frequently and with some regularity? We could potentially force > re-vendoring at PR time as well any time shared code changes (and prevent > it by pre-commit. And I can't think of some place (other than releases) in > our development workflow and that seems to be a bit too late as puts an > extra effort on fixing potential incompatibilities introduced on release > manager and delays the release. WDYT? > > Re: relative links. I think for a shared library we could potentially relax > this and allow them (and actually disallow absolute links in the pieces of > code that are shared - again, by pre-commit). As I recall, the only reason > we forbade the relative links is because of how we are (or maybe were) > doing DAG parsing and failures resulting from it. So we decided to just not > allow it to keep consistency. The way how Dag parsing works is that when > you are using importlib to read the Dag from a file, the relative imports > do not work as it does not know what they should be relative to. But if > relative import is done from an imported package, it should be no problem, > I think - otherwise our Dags would not be able to import any library that > uses relative imports. > > Of course consistency might be the reason why we do not want to introduce > relative imports. I don't see it as an issue if it is guarded by pre-commit > though. > > J. > > > J. > > > czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <a...@apache.org> > napisał: > >> Oh yes, symlinks will work, with one big caveat: It does mean you can’t >> use absolute imports in one common module to another. >> >> For example >> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 >> where we have >> >> ``` >> from airflow.utils.module_loading import import_string >> ``` >> >> if we want to move serve_logs into this common lib that is then symlinked >> then we wouldn’t be able to have `from airflow_common.module_loading import >> import_string`. >> >> I can think of two possible solutions here. >> >> 1) is to allow/require relative imports in this shared lib, so `from >> .module_loading import import_string` >> 2) is to use `vendoring`[1] (from the pip maintainers) which will handle >> import-rewriting for us. >> >> I’d entirely forgot that symlinks in repos was a thing, so I prepared a >> minimal POC/demo of what vendoring approach could look like here >> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 >> >> Now personally I am more than happy with relative imports, but generally >> as a project we have avoided them, so I think that limits what we could do >> with a symlink based approach. >> >> -ash >> >> [1] https://github.com/pradyunsg/vendoring >> >>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <gopidesupa...@gmail.com> >> wrote: >>> >>> Thanks Ash >>> >>> Yes agree option 2 would be preferred for me. Making sure we have all the >>> gaurdriles to protect any unwanted behaviour in code sharing and >> executing >>> right of tests between the packages. >>> >>> Agree with others, option 2 would be >>> >>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <amoghdesai....@gmail.com> >>> wrote: >>> >>>> Thanks for starting this discussion, Ash. >>>> >>>> I would prefer option 2 here with proper tooling to handle the code >>>> duplication at *release* time. >>>> It is best to have a dist that has all it needs in itself. >>>> >>>> Option 1 could very quickly get out of hand and if we decide to separate >>>> triggerer / >>>> dag processor / config etc etc as separate packages, back compat is >> going >>>> to be a nightmare >>>> and will bite us harder than we anticipate. >>>> >>>> Thanks & Regards, >>>> Amogh Desai >>>> >>>> >>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <kaxiln...@gmail.com> wrote: >>>> >>>>> I prefer Option 2 as well to avoid matrix of dependencies >>>>> >>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler <j_scheff...@gmx.de.invalid >>> >>>>> wrote: >>>>> >>>>>> I'd also rather prefer option 2 - reason here is it is rather >> pragmatic >>>>>> and we no not need to cut another package and have less package counts >>>>>> and dependencies. >>>>>> >>>>>> I remember some time ago I was checking (together with Jarek, I am not >>>>>> sure anymore...) if the usage of symlinks would be possible. To keep >>>> the >>>>>> source in one package but "symlink" it into another. If then at point >>>> of >>>>>> packaging/release the files are materialized we have 1 set of code. >>>>>> >>>>>> Otherwise if not possible still the redundancy could be solved by a >>>>>> pre-commit hook - and in Git the files are de-duplicated anyway based >>>> on >>>>>> content hash, so this does not hurt. >>>>>> >>>>>> On 02.07.25 18:49, Shahar Epstein wrote: >>>>>>> I support option 2 with proper automation & CI - the reasonings >>>> you've >>>>>>> shown for that make sense to me. >>>>>>> >>>>>>> >>>>>>> Shahar >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org> >>>>> wrote: >>>>>>> >>>>>>>> Hello everyone, >>>>>>>> >>>>>>>> As we work on finishing off the code-level separation of Task SDK >>>> and >>>>>> Core >>>>>>>> (scheduler etc) we have come across some situations where we would >>>>> like >>>>>> to >>>>>>>> share code between these. >>>>>>>> >>>>>>>> However it’s not as straight forward of “just put it in a common >>>> dist >>>>>> they >>>>>>>> both depend upon” because one of the goals of the Task SDK >>>> separation >>>>>> was >>>>>>>> to have 100% complete version independence between the two, ideally >>>>>> even if >>>>>>>> they are built into the same image and venv. Most of the reason why >>>>> this >>>>>>>> isn’t straight forward comes down to backwards compatibility - if we >>>>>> make >>>>>>>> an change to the common/shared distribution >>>>>>>> >>>>>>>> >>>>>>>> We’ve listed the options we have thought about in >>>>>>>> https://github.com/apache/airflow/issues/51545 (but that covers >>>> some >>>>>> more >>>>>>>> things that I don’t want to get in to in this discussion such as >>>>>> possibly >>>>>>>> separating operators and executors out of a single provider dist.) >>>>>>>> >>>>>>>> To give a concrete example of some code I would like to share >>>>>>>> >>>>>> >>>>> >>>> >> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>>>>>>> — logging config. Another thing we will want to share will be the >>>>>>>> AirflowConfigParser class from airflow.configuration (but notably: >>>>> only >>>>>> the >>>>>>>> parser class, _not_ the default config values, again, lets not dwell >>>>> on >>>>>> the >>>>>>>> specifics of that) >>>>>>>> >>>>>>>> So to bring the options listed in the issue here for discussion, >>>>> broadly >>>>>>>> speaking there are two high-level approaches: >>>>>>>> >>>>>>>> 1. A single shared distribution >>>>>>>> 2. No shared package and copy/duplicate code >>>>>>>> >>>>>>>> The advantage of Approach 1 is that we only have the code in one >>>>> place. >>>>>>>> However for me, at least in this specific case of Logging config or >>>>>>>> AirflowConfigParser class is that backwards compatibility is much >>>> much >>>>>>>> harder. >>>>>>>> >>>>>>>> The main advantage of Approach 2 is the the code is released >>>>>> with/embedded >>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain the right >>>>>> version >>>>>>>> of the logging config and ConfigParser etc). The downside is that >>>>> either >>>>>>>> the code will need to be duplicated in the repo, or better yet it >>>>> would >>>>>>>> live in a single place in the repo, but some tooling (TBD) will >>>>>>>> automatically handle the duplication, either at commit time, or my >>>>>>>> preference, at release time. >>>>>>>> >>>>>>>> For this kind of shared “utility” code I am very strongly leaning >>>>>> towards >>>>>>>> option 2 with automation, as otherwise I think the backwards >>>>>> compatibility >>>>>>>> requirements would make it unworkable (very quickly over time the >>>>>>>> combinations we would have to test would just be unreasonable) and I >>>>>> don’t >>>>>>>> feel confident we can have things as stable as we need to really >>>>> deliver >>>>>>>> the version separation/independency I want to delivery with AIP-72. >>>>>>>> >>>>>>>> So unless someone feels very strongly about this, I will come up >>>> with >>>>> a >>>>>>>> draft PR for further discussion that will implement code sharing via >>>>>>>> “vendoring” it at build time. I have an idea of how I can achieve >>>> this >>>>>> so >>>>>>>> we have a single version in the repo and it’ll work there, but at >>>>>> runtime >>>>>>>> we vendor it in to the shipped dist so it lives at something like >>>>>>>> `airflow.sdk._vendor` etc. >>>>>>>> >>>>>>>> In terms of repo layout, this likely means we would end up with: >>>>>>>> >>>>>>>> airflow-core/pyproject.toml >>>>>>>> airflow-core/src/ >>>>>>>> airflow-core/tests/ >>>>>>>> task-sdk/pyproject.toml >>>>>>>> task-sdk/src/ >>>>>>>> task-sdk/tests/ >>>>>>>> airflow-common/src >>>>>>>> airflow-common/tests/ >>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would be >>>> included >>>>>> in >>>>>>>> the downstream projects. TBD. >>>>>>>> >>>>>>>> Thoughts and feedback welcomed. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>> >>>>>> >>>>> >>>> >> >>