Agreed! Once the PR is up, we can have these implementation level discussions over there. Good chat however!
Thanks & Regards, Amogh Desai On Wed, Jul 9, 2025 at 3:56 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Yeah. I think extracting one-by-one, feature-by-feature that we want to > share to a separate distribution is the best approach - it will actually > also help with the "__init__.py" cleanup - because almost by definition - > those distributions will not be able to "reach" outside - i.e. they only > can be "used" not "use" something else. Which means that - for example as > it is now - configuration using logging which is using configuration > (leading to circular dependencies and partially initialized modules) will > not happen and we will have to figure out rather how to inject > configuration into logging (from task-sdk, airflow-ctl, airflow-core) and > to get the right sequence of initialization - rather than have the > inter-feature-dependencies. > > And that is precisely what low-cyclomatic complexity is about as well - it > generally leads to easier-to-maintain software that has well defined > functionality and does not have those weird circular dependencies we have > now. That's a kind-of side-effect of such a "per feature" split, but a very > desirable one. > > J. > > On Wed, Jul 9, 2025 at 10:17 AM Amogh Desai <amoghdesai....@gmail.com> > wrote: > >> Probably, you make a valid point. >> >> Maybe this is an implementation detail, so we could figure it out as we >> start on a POC and factor in these things >> as we move along? >> >> But from an initial guess, I would think that execution time related >> items (if we manage to enumerate them) would be something >> that would be better off in that "core_and_task_sdk" bundle. >> >> Thanks & Regards, >> Amogh Desai >> >> >> On Tue, Jul 8, 2025 at 3:46 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> > Not that I am against your idea and we can surely expand as we need >>> but we would not need to expand the >>> "core_and_task_sdk" if we put only the relevant items into it. >>> >>> So if we move logging and config out, my question is what is really >>> relevant to "stay" in "core_and_task_sdk" ? And what we know will not be >>> needed by other distributions in the future ? >>> >>> What would be the content of "core_and_task_sdk" ? Can we enumerate what >>> should go there now ? >>> >>> My bet is that if we enumerate them, it will turn out that they are good >>> candidates to make separate "shared features" that can be logically named >>> and modularised, and that there is no real need to have shared >>> "core_and_task_sdk" which sounds like "bag of everything else". >>> >>> >>> >>> On Tue, Jul 8, 2025 at 11:48 AM Amogh Desai <amoghdesai....@gmail.com> >>> wrote: >>> >>>> Yeah, I think what you are showcasing here is a step ahead of the >>>> initial proposal from Ash. >>>> >>>> From the original proposal, the `core_and_task_sdk` *can* have the >>>> things relevant to just those two >>>> distros. Logging, Config are modules that might be needed by >>>> airflow-ctl for example, so ideally, those >>>> would not be good candidates to be put in there, ideally speaking. >>>> >>>> The example of Kubernetes Utils sounds to be a good example, it will be >>>> used by KubeExecutor (lets say this is a >>>> module called "executors") and by KPO (providers), and the >>>> "shared/kubernetes" would probably be a good >>>> candidate for that. >>>> >>>> Not that I am against your idea and we can surely expand as we need but >>>> we would not need to expand the >>>> "core_and_task_sdk" if we put only the relevant items into it. >>>> >>>> Thanks & Regards, >>>> Amogh Desai >>>> >>>> >>>> On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>> >>>>> > @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean >>>>> there, I am understanding the direction >>>>> but could you elaborate a bit more please? >>>>> >>>>> Let me elaborate: >>>>> >>>>> As I understand (maybe I am wrong?), the proposal is that we have a >>>>> "core-and-task-sdk" folder which is a shared distribution that is >>>>> vendored-in into both "airflow-core" and "airflow-task-sdk". This >>>>> contains some shared code that we want to include in both distributions >>>>> (note that we never ever release "core-and-task-sdk" >>>>> distribution because it only contains code that is shared between the two >>>>> distributions we release. >>>>> >>>>> That's fine and cool. >>>>> >>>>> Imagine that this distribution contains "logging" (shared code >>>>> for logging) and "config" (shared code for configuration). - both needed >>>>> in >>>>> "airflow-core" and "airflow-task-sdk". So far so good. But what happen if >>>>> we want to use logging in the same fashion in say "airflow-ctl" (that also >>>>> applies for other distributions that we might come up with) ? Are we going >>>>> to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" ? >>>>> It would be far better if we just vendor in "logging" and do not vendor-in >>>>> "config". >>>>> >>>>> And if we are going to have a mechanism to vendor-in "a distribution" >>>>> - there is nothing wrong with having the same way to vendor-in multiple >>>>> distributions - so we can easily do it this way (i added "orm" , >>>>> "serialization". and "fast_api" as an example thing that we might want to >>>>> share - not sure if that is really something we want to do but it will >>>>> allow to illustrate my idea better) >>>>> >>>>> / >>>>> airflow-ctl/ >>>>> task-sdk/... >>>>> airflow-core/... >>>>> .... >>>>> shared/ >>>>> kubernetes/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared_kubernetes/__init__.py >>>>> logging/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared_logging/__init__.py >>>>> config/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared_config/__init__.py >>>>> orm/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared_orm/__init__.py >>>>> serialization/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared_serialization/__init__.py >>>>> fast_api/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared_fast_api/__init__.py >>>>> ... >>>>> >>>>> This has multiple benefits (and I see no real drawbacks): >>>>> >>>>> * the code can be really well modularised. Those "things" we share >>>>> (and this also connects to the __init__.py discussion) - can be >>>>> independent >>>>> - and (it follow Jens comment) it allows to keep low cyclomatic complexity >>>>> https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way >>>>> easier to implement logging in the way that it does not import or use >>>>> config. This means for example that configuration for logging will need to >>>>> be injected when logging is initialized - and that's exactly what we want, >>>>> we do not want logging code to use configuration code directly - they >>>>> should be independent from each other and you should be free to vendor-in >>>>> either logging or config independently if you also vendored-in the other. >>>>> >>>>> * it's much more logical. We split based on functionality we want to >>>>> share - not about the "distributions" we want to produce. That allows us - >>>>> in the future - to make different decisions on how we split our >>>>> distributions. For example (i do not tell we have to do it, or that we >>>>> will >>>>> do it but we will have such a possibility) - we can add more shared >>>>> utilities we find useful in the same way and decide that "scheduler", >>>>> "api_server" or "scheduler" or "triggerer" or "dag processor" are split to >>>>> separate distributions - because for example we want to keep a number of >>>>> dependencies down. And for example "api_server" might use "fast_api", >>>>> "config", "logging", "orm" and "fast_api" and "serialization" , where the >>>>> scheduler should not need "fast_api". The "dag_processor" eventually might >>>>> not need "orm" nor "fast_api" and only use the other >>>>> >>>>> This seems like a natural approach. If we have a mechanism to "share" >>>>> the code, it does not add complexity, but allows us to isolate independent >>>>> functionality into "isolated" boxes and use them >>>>> >>>>> Also for cyclomatic complexity that is a complex word (badly chosen as >>>>> it scares people away) and has some math behind, but it really boils down >>>>> to very simple "rules of thumb" (and yes I am a big proponent of having >>>>> low >>>>> cyclomatic complexity). >>>>> >>>>> a) when you are building the "final" product (i.e. distribution you >>>>> want to release) - make sure that you only "use" things - that nothing >>>>> else >>>>> is "using" you as a library. >>>>> b) when you are building a "shared" thing (a library) - make sure that >>>>> library is only "used" by others but it does not really "use" anything >>>>> else. For example in the case I explained above - we can achieve >>>>> low-cyclomatic complexity when: >>>>> >>>>> * airflow-core uses: logging, config, orm, serialization, fast_api >>>>> * none of the "logging, config, orm, serialization, fast_api" use each >>>>> other - they are at the bottom of the "user -> used" tree >>>>> >>>>> J. >>>>> >>>>> >>>>> >>>>> On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com> >>>>> wrote: >>>>> >>>>>> I like the folder structure proposed by Ash and have no objections >>>>>> with it. >>>>>> >>>>>> "core_and_task_sdk" sounds good to me and justifies what it should do >>>>>> pretty well. >>>>>> >>>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean >>>>>> there, I am understanding the direction >>>>>> but could you elaborate a bit more please? >>>>>> >>>>>> Naming is REALLY hard! >>>>>> >>>>>> Thanks & Regards, >>>>>> Amogh Desai >>>>>> >>>>>> >>>>>> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>> >>>>>>> How about splitting it even more and having each shared "thing" >>>>>>> named? >>>>>>> "logging", "config" and sharing them explicitly and separately with >>>>>>> the >>>>>>> right "user" ? >>>>>>> That sounds way more modular and we will be able to choose which of >>>>>>> the >>>>>>> shared "utils" we use where. >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler >>>>>>> <j_scheff...@gmx.de.invalid> >>>>>>> wrote: >>>>>>> >>>>>>> > I like "core_and_task_sdk the same like core-and-task-sdk - I have >>>>>>> no >>>>>>> > problem and it is a path only. >>>>>>> > >>>>>>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which >>>>>>> is a >>>>>>> > bit bulky we then should name it "all-not-api-server" :-D >>>>>>> > >>>>>>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote: >>>>>>> > > In case I did a bad job explaining it, the “core and task sdk” >>>>>>> is not in >>>>>>> > the module name/import name, just in the file path. >>>>>>> > > >>>>>>> > > Anyone have other ideas? >>>>>>> > > >>>>>>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> >>>>>>> wrote: >>>>>>> > >> >>>>>>> > >> Thanks Ash! Looks cool! I like the structure. This will enable >>>>>>> all the >>>>>>> > >> combinations and structure looks easy to grasp. No strong >>>>>>> stance on the >>>>>>> > >> naming other than maybe it is a bit long with `and`, `core_ctl` >>>>>>> could be >>>>>>> > >> shorter, since no import path is defined like that, we can give >>>>>>> any name >>>>>>> > >> for sure. >>>>>>> > >> >>>>>>> > >> Best regards, >>>>>>> > >> >>>>>>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> >>>>>>> wrote: >>>>>>> > >>> >>>>>>> > >>> Looks good but I think we should find some better logical name >>>>>>> for >>>>>>> > >>> core_and_sdk :) >>>>>>> > >>> >>>>>>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler >>>>>>> > >>> <j_scheff...@gmx.de.invalid> napisał: >>>>>>> > >>> >>>>>>> > >>>> Cool! Especially the "shared" folder with the ability to have >>>>>>> > >>>> N-combinations w/o exploding project repo root! >>>>>>> > >>>> >>>>>>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: >>>>>>> > >>>>> Oh, and all of this will be explain in shared/README.md >>>>>>> > >>>>> >>>>>>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>> >>>>>>> > >>>>>> Okay, so it seems we have agreement on the approach here, >>>>>>> so I’ll >>>>>>> > >>>> continue with this, and on the dev call it was mentioned that >>>>>>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal >>>>>>> for the >>>>>>> > file >>>>>>> > >>>> structure; >>>>>>> > >>>>>> ``` >>>>>>> > >>>>>> / >>>>>>> > >>>>>> task-sdk/... >>>>>>> > >>>>>> airflow-core/... >>>>>>> > >>>>>> shared/ >>>>>>> > >>>>>> kuberenetes/ >>>>>>> > >>>>>> pyproject.toml >>>>>>> > >>>>>> src/ >>>>>>> > >>>>>> airflow_kube/__init__.py >>>>>>> > >>>>>> core-and-tasksdk/ >>>>>>> > >>>>>> pyproject.toml >>>>>>> > >>>>>> src/ >>>>>>> > >>>>>> airflow_shared/__init__.py >>>>>>> > >>>>>> ``` >>>>>>> > >>>>>> >>>>>>> > >>>>>> Things to note here: the “shared” folder has (the >>>>>>> possibility) of >>>>>>> > >>>> having multiple different shared “libraries” in it, in this >>>>>>> example I >>>>>>> > am >>>>>>> > >>>> supposing a hypothetical shared kuberenetes folder a world in >>>>>>> which we >>>>>>> > >>>> split the KubePodOperator and the KubeExecutor in to two >>>>>>> separate >>>>>>> > >>>> distributions (example only, not proposing we do that right >>>>>>> now, and >>>>>>> > that >>>>>>> > >>>> will be a separate discussion) >>>>>>> > >>>>>> The other things to note here: >>>>>>> > >>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>> - the folder name in shared aims to be “self-documenting”, >>>>>>> hence the >>>>>>> > >>>> verbose “core-and-tasksdk” to say where the shared library is >>>>>>> > intended to >>>>>>> > >>>> be used. >>>>>>> > >>>>>> - the python module itself should almost always have an >>>>>>> `airflow_` >>>>>>> > (or >>>>>>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with >>>>>>> anything >>>>>>> > >>> else >>>>>>> > >>>> we might use. It won’t matter “in production” as those will be >>>>>>> > vendored >>>>>>> > >>> in >>>>>>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but >>>>>>> avoiding >>>>>>> > >>>> conflicts at dev time with the Finder approach is a good >>>>>>> safety >>>>>>> > measure. >>>>>>> > >>>>>> I will start making a real PR for this proposal now, but >>>>>>> I’m open to >>>>>>> > >>>> feedback (either here, or in the PR when I open it) >>>>>>> > >>>>>> -ash >>>>>>> > >>>>>> >>>>>>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> >>>>>>> wrote: >>>>>>> > >>>>>>> >>>>>>> > >>>>>>> Yeah we have to try it and test - also building packages >>>>>>> happens >>>>>>> > semi >>>>>>> > >>>>>>> frequently when you run `uv sync` (they use some kind of >>>>>>> heuristics >>>>>>> > >>> to >>>>>>> > >>>>>>> decide when) and you can force it with `--reinstall` or >>>>>>> > `--refresh`. >>>>>>> > >>>>>>> Package build also happens every time when you run >>>>>>> "ci-image build` >>>>>>> > >>>> now in >>>>>>> > >>>>>>> breeze so it seems like it will nicely integrate in our >>>>>>> workflows. >>>>>>> > >>>>>>> >>>>>>> > >>>>>>> Looks really cool Ash. >>>>>>> > >>>>>>> >>>>>>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor < >>>>>>> a...@apache.org> >>>>>>> > >>>> wrote: >>>>>>> > >>>>>>>> It’s not just release time, but any time we build a >>>>>>> package which >>>>>>> > >>>> happens >>>>>>> > >>>>>>>> on “every” CI run. The normal unit tests will use code >>>>>>> from >>>>>>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which >>>>>>> build an >>>>>>> > >>>> image will >>>>>>> > >>>>>>>> build the dists and vendor in the code from that commit. >>>>>>> > >>>>>>>> >>>>>>> > >>>>>>>> There is only a single copy of the shared code committed >>>>>>> to the >>>>>>> > >>> repo, >>>>>>> > >>>> so >>>>>>> > >>>>>>>> there is never anything to synchronise. >>>>>>> > >>>>>>>> >>>>>>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai < >>>>>>> amoghdesai....@gmail.com> >>>>>>> > >>>> wrote: >>>>>>> > >>>>>>>>> Thanks Ash. >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> This is really cool and helpful that you were able to >>>>>>> test both >>>>>>> > >>>> scenarios >>>>>>> > >>>>>>>>> -- repo checkout >>>>>>> > >>>>>>>>> and also installing from the vendored package and the >>>>>>> resolution >>>>>>> > >>>> worked >>>>>>> > >>>>>>>>> fine too. >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> I like this idea compared the to relative import one for >>>>>>> few >>>>>>> > >>> reasons: >>>>>>> > >>>>>>>>> - It feels like it will take some time to adjust to the >>>>>>> new >>>>>>> > coding >>>>>>> > >>>>>>>> standard >>>>>>> > >>>>>>>>> that we will lay >>>>>>> > >>>>>>>>> if we impose relative imports in the shared dist >>>>>>> > >>>>>>>>> - We can continue using repo wise absolute import >>>>>>> standards, it >>>>>>> > is >>>>>>> > >>>> also >>>>>>> > >>>>>>>>> much easier for situations >>>>>>> > >>>>>>>>> when we do global search in IDE to find + replace, this >>>>>>> could >>>>>>> > mean >>>>>>> > >>> a >>>>>>> > >>>>>>>> change >>>>>>> > >>>>>>>>> there >>>>>>> > >>>>>>>>> - The vendoring work is a proven and established >>>>>>> paradigm across >>>>>>> > >>>> projects >>>>>>> > >>>>>>>>> and would >>>>>>> > >>>>>>>>> out of box give us the build tooling we need also >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> Nothing too against the relative import but with the >>>>>>> evidence >>>>>>> > >>>> provided >>>>>>> > >>>>>>>>> above, vendored approach >>>>>>> > >>>>>>>>> seems to only do us good. >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> Regarding synchronizing it, release time should be fine >>>>>>> as long >>>>>>> > as >>>>>>> > >>> we >>>>>>> > >>>>>>>> have >>>>>>> > >>>>>>>>> a good CI workflow to probably >>>>>>> > >>>>>>>>> catch such issues per PR if changes are made in shared >>>>>>> dist? >>>>>>> > >>>> (precommit >>>>>>> > >>>>>>>>> would make it really slow i guess) >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> If we can run our tests with vendored code we should be >>>>>>> mostly >>>>>>> > >>>> covered. >>>>>>> > >>>>>>>>> Good effort all! >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> Thanks & Regards, >>>>>>> > >>>>>>>>> Amogh Desai >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor < >>>>>>> > a...@apache.org> >>>>>>> > >>>>>>>> wrote: >>>>>>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m >>>>>>> happy with. >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core >>>>>>> > >>>>>>>>>> This produces the following from `uv build task-sdk` >>>>>>> > >>>>>>>>>> - >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>> > >>>>>>>>>> - >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip >>>>>>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will >>>>>>> .zip) >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> ❯ unzip -l >>>>>>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | >>>>>>> > >>>>>>>> grep >>>>>>> > >>>>>>>>>> _vendor >>>>>>> > >>>>>>>>>> 50 02-02-2020 00:00 >>>>>>> airflow/sdk/_vendor/.gitignore >>>>>>> > >>>>>>>>>> 2082 02-02-2020 00:00 >>>>>>> airflow/sdk/_vendor/__init__.py >>>>>>> > >>>>>>>>>> 28 02-02-2020 00:00 >>>>>>> > airflow/sdk/_vendor/airflow_common.pyi >>>>>>> > >>>>>>>>>> 18 02-02-2020 00:00 >>>>>>> airflow/sdk/_vendor/vendor.txt >>>>>>> > >>>>>>>>>> 785 02-02-2020 00:00 >>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>> > >>>>>>>>>> 10628 02-02-2020 00:00 >>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is >>>>>>> complete too: >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>> |grep >>>>>>> > _vendor >>>>>>> > >>>>>>>>>> >>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore >>>>>>> > >>>>>>>>>> >>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py >>>>>>> > >>>>>>>>>> >>>>>>> > >>>> >>>>>>> > >>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi >>>>>>> > >>>>>>>>>> >>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>> > >>> >>>>>>> > >>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> The plugin works at build time by including/copying the >>>>>>> libs >>>>>>> > >>>> specified >>>>>>> > >>>>>>>> in >>>>>>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of >>>>>>> import >>>>>>> > >>>>>>>> rewrites.) >>>>>>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from >>>>>>> a repo >>>>>>> > >>>> checkout, >>>>>>> > >>>>>>>> I >>>>>>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and >>>>>>> since it’s at >>>>>>> > >>> the >>>>>>> > >>>>>>>> end of >>>>>>> > >>>>>>>>>> the list it will only be used if the normal import >>>>>>> can’t find >>>>>>> > >>>> things. >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py >>>>>>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect >>>>>>> “import >>>>>>> > >>>> rewriting” >>>>>>> > >>>>>>>>>> affect, as in this approach `airflow_common` is >>>>>>> directly loaded >>>>>>> > >>>> (i.e. >>>>>>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common >>>>>>> exist in >>>>>>> > >>>>>>>>>> sys.modules), but it does work for everything that I >>>>>>> was able to >>>>>>> > >>>> test.. >>>>>>> > >>>>>>>>>> I tested it with the diff at the end of this message. >>>>>>> My test >>>>>>> > >>>> ipython >>>>>>> > >>>>>>>>>> shell: >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> In [1]: from >>>>>>> airflow.sdk._vendor.airflow_common.timezone import >>>>>>> > >>> foo >>>>>>> > >>>>>>>>>> In [2]: foo >>>>>>> > >>>>>>>>>> Out[2]: 1 >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> In [4]: import >>>>>>> airflow.sdk._vendor.airflow_common.timezone >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ >>>>>>> > >>>>>>>>>> Out[5]: >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' >>>>>>> > >>>>>>>>>> In [6]: >>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>> > >>>>>>>>>> Out[6]: >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> And in an standalone environment with the SDK dist I >>>>>>> built (it >>>>>>> > >>>> needed >>>>>>> > >>>>>>>> the >>>>>>> > >>>>>>>>>> matching airflow-core right now, but that is nothing to >>>>>>> do with >>>>>>> > >>> this >>>>>>> > >>>>>>>>>> discussion): >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with >>>>>>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with >>>>>>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl >>>>>>> ipython >>>>>>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang >>>>>>> 18.1.8 ] >>>>>>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more >>>>>>> information >>>>>>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type >>>>>>> '?' for >>>>>>> > >>> help. >>>>>>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the >>>>>>> options with >>>>>>> > >>>>>>>> `%history?` >>>>>>> > >>>>>>>>>> In [1]: import >>>>>>> airflow.sdk._vendor.airflow_common.timezone >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> In [2]: >>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>> > >>>>>>>>>> Out[2]: >>>>>>> > >>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ >>>>>>> > >>>>>>>>>> `` >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> ```diff >>>>>>> > >>>>>>>>>> diff --git >>>>>>> a/airflow-common/src/airflow_common/__init__.py >>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py >>>>>>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644 >>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py >>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py >>>>>>> > >>>>>>>>>> @@ -14,3 +14,5 @@ >>>>>>> > >>>>>>>>>> # KIND, either express or implied. See the License for >>>>>>> the >>>>>>> > >>>>>>>>>> # specific language governing permissions and >>>>>>> limitations >>>>>>> > >>>>>>>>>> # under the License. >>>>>>> > >>>>>>>>>> + >>>>>>> > >>>>>>>>>> +foo = 1 >>>>>>> > >>>>>>>>>> diff --git >>>>>>> a/airflow-common/src/airflow_common/timezone.py >>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py >>>>>>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644 >>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py >>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py >>>>>>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = >>>>>>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 >>>>>>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 >>>>>>> > >>>>>>>>>> utc = pendulum.UTC >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> + >>>>>>> > >>>>>>>>>> +from airflow_common import foo >>>>>>> > >>>>>>>>>> + >>>>>>> > >>>>>>>>>> TIMEZONE: Timezone >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> ``` >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk < >>>>>>> ja...@potiuk.com> >>>>>>> > wrote: >>>>>>> > >>>>>>>>>>> I think both approaches are doable: >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by >>>>>>> pre-commit when >>>>>>> > >>>>>>>> importing >>>>>>> > >>>>>>>>>>> from different distributions and make sure we are only >>>>>>> doing >>>>>>> > >>>> relative >>>>>>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of >>>>>>> this >>>>>>> > >>>> already. And >>>>>>> > >>>>>>>>>> yes >>>>>>> > >>>>>>>>>>> it would require relative links we currently do not >>>>>>> allow. >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point >>>>>>> in time >>>>>>> > >>> will >>>>>>> > >>>> have >>>>>>> > >>>>>>>>>> to >>>>>>> > >>>>>>>>>>> decide to synchronize this and if it happens just >>>>>>> before >>>>>>> > release >>>>>>> > >>>> (I bet >>>>>>> > >>>>>>>>>>> this is going to happen) this will lead to solving >>>>>>> problems >>>>>>> > that >>>>>>> > >>>> would >>>>>>> > >>>>>>>>>>> normally be solved during PR when you make a change >>>>>>> (i.e. >>>>>>> > >>> symbolic >>>>>>> > >>>> link >>>>>>> > >>>>>>>>>> has >>>>>>> > >>>>>>>>>>> the advantage that whoever modifies shared code will be >>>>>>> > >>> immediately >>>>>>> > >>>>>>>>>>> notified in their PR - that they broke something >>>>>>> because either >>>>>>> > >>>> static >>>>>>> > >>>>>>>>>>> checks or mypy or tests fail. >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when) >>>>>>> does the >>>>>>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could >>>>>>> solve it >>>>>>> > if >>>>>>> > >>>> it is >>>>>>> > >>>>>>>>>> done >>>>>>> > >>>>>>>>>>> more frequently and with some regularity? We could >>>>>>> potentially >>>>>>> > >>>> force >>>>>>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code >>>>>>> changes >>>>>>> > (and >>>>>>> > >>>>>>>> prevent >>>>>>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place >>>>>>> (other than >>>>>>> > >>>> releases) >>>>>>> > >>>>>>>>>> in >>>>>>> > >>>>>>>>>>> our development workflow and that seems to be a bit >>>>>>> too late as >>>>>>> > >>>> puts an >>>>>>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities >>>>>>> introduced >>>>>>> > on >>>>>>> > >>>>>>>> release >>>>>>> > >>>>>>>>>>> manager and delays the release. WDYT? >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> Re: relative links. I think for a shared library we >>>>>>> could >>>>>>> > >>>> potentially >>>>>>> > >>>>>>>>>> relax >>>>>>> > >>>>>>>>>>> this and allow them (and actually disallow absolute >>>>>>> links in >>>>>>> > the >>>>>>> > >>>> pieces >>>>>>> > >>>>>>>>>> of >>>>>>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I >>>>>>> recall, the >>>>>>> > >>> only >>>>>>> > >>>>>>>>>> reason >>>>>>> > >>>>>>>>>>> we forbade the relative links is because of how we are >>>>>>> (or >>>>>>> > maybe >>>>>>> > >>>> were) >>>>>>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So >>>>>>> we decided >>>>>>> > >>> to >>>>>>> > >>>> just >>>>>>> > >>>>>>>>>> not >>>>>>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing >>>>>>> works is >>>>>>> > >>> that >>>>>>> > >>>>>>>> when >>>>>>> > >>>>>>>>>>> you are using importlib to read the Dag from a file, >>>>>>> the >>>>>>> > relative >>>>>>> > >>>>>>>> imports >>>>>>> > >>>>>>>>>>> do not work as it does not know what they should be >>>>>>> relative >>>>>>> > to. >>>>>>> > >>>> But if >>>>>>> > >>>>>>>>>>> relative import is done from an imported package, it >>>>>>> should be >>>>>>> > no >>>>>>> > >>>>>>>>>> problem, >>>>>>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to >>>>>>> import any >>>>>>> > >>>> library >>>>>>> > >>>>>>>> that >>>>>>> > >>>>>>>>>>> uses relative imports. >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> Of course consistency might be the reason why we do >>>>>>> not want to >>>>>>> > >>>>>>>> introduce >>>>>>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is >>>>>>> guarded >>>>>>> > by >>>>>>> > >>>>>>>>>> pre-commit >>>>>>> > >>>>>>>>>>> though. >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> J. >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> J. >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < >>>>>>> > >>>> a...@apache.org> >>>>>>> > >>>>>>>>>>> napisał: >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It >>>>>>> does mean >>>>>>> > >>> you >>>>>>> > >>>>>>>> can’t >>>>>>> > >>>>>>>>>>>> use absolute imports in one common module to another. >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> For example >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 >>>>>>> > >>>>>>>>>>>> where we have >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> ``` >>>>>>> > >>>>>>>>>>>> from airflow.utils.module_loading import import_string >>>>>>> > >>>>>>>>>>>> ``` >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib >>>>>>> that is >>>>>>> > then >>>>>>> > >>>>>>>>>> symlinked >>>>>>> > >>>>>>>>>>>> then we wouldn’t be able to have `from >>>>>>> > >>>> airflow_common.module_loading >>>>>>> > >>>>>>>>>> import >>>>>>> > >>>>>>>>>>>> import_string`. >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> I can think of two possible solutions here. >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this >>>>>>> shared lib, so >>>>>>> > >>>> `from >>>>>>> > >>>>>>>>>>>> .module_loading import import_string` >>>>>>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip >>>>>>> maintainers) which >>>>>>> > >>> will >>>>>>> > >>>>>>>> handle >>>>>>> > >>>>>>>>>>>> import-rewriting for us. >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a >>>>>>> thing, so I >>>>>>> > >>>> prepared >>>>>>> > >>>>>>>> a >>>>>>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could >>>>>>> look like >>>>>>> > here >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 >>>>>>> > >>>>>>>>>>>> Now personally I am more than happy with relative >>>>>>> imports, but >>>>>>> > >>>>>>>> generally >>>>>>> > >>>>>>>>>>>> as a project we have avoided them, so I think that >>>>>>> limits what >>>>>>> > >>> we >>>>>>> > >>>>>>>> could >>>>>>> > >>>>>>>>>> do >>>>>>> > >>>>>>>>>>>> with a symlink based approach. >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> -ash >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring >>>>>>> > >>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < >>>>>>> > >>>>>>>> gopidesupa...@gmail.com> >>>>>>> > >>>>>>>>>>>> wrote: >>>>>>> > >>>>>>>>>>>>> Thanks Ash >>>>>>> > >>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making >>>>>>> sure we >>>>>>> > >>>> have all >>>>>>> > >>>>>>>>>> the >>>>>>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code >>>>>>> sharing >>>>>>> > >>> and >>>>>>> > >>>>>>>>>>>> executing >>>>>>> > >>>>>>>>>>>>> right of tests between the packages. >>>>>>> > >>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>> Agree with others, option 2 would be >>>>>>> > >>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < >>>>>>> > >>>>>>>> amoghdesai....@gmail.com> >>>>>>> > >>>>>>>>>>>>> wrote: >>>>>>> > >>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash. >>>>>>> > >>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to >>>>>>> handle >>>>>>> > the >>>>>>> > >>>> code >>>>>>> > >>>>>>>>>>>>>> duplication at *release* time. >>>>>>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in >>>>>>> itself. >>>>>>> > >>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if >>>>>>> we decide >>>>>>> > >>> to >>>>>>> > >>>>>>>>>> separate >>>>>>> > >>>>>>>>>>>>>> triggerer / >>>>>>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate >>>>>>> packages, back >>>>>>> > >>>> compat is >>>>>>> > >>>>>>>>>>>> going >>>>>>> > >>>>>>>>>>>>>> to be a nightmare >>>>>>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate. >>>>>>> > >>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>> Thanks & Regards, >>>>>>> > >>>>>>>>>>>>>> Amogh Desai >>>>>>> > >>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < >>>>>>> > >>> kaxiln...@gmail.com> >>>>>>> > >>>>>>>>>> wrote: >>>>>>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of >>>>>>> dependencies >>>>>>> > >>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler >>>>>>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid >>>>>>> > >>>>>>>>>>>>>>> wrote: >>>>>>> > >>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is >>>>>>> it is >>>>>>> > >>> rather >>>>>>> > >>>>>>>>>>>> pragmatic >>>>>>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and >>>>>>> have less >>>>>>> > >>>> package >>>>>>> > >>>>>>>>>> counts >>>>>>> > >>>>>>>>>>>>>>>> and dependencies. >>>>>>> > >>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together >>>>>>> with >>>>>>> > >>> Jarek, >>>>>>> > >>>> I am >>>>>>> > >>>>>>>>>> not >>>>>>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be >>>>>>> > possible. >>>>>>> > >>>> To >>>>>>> > >>>>>>>> keep >>>>>>> > >>>>>>>>>>>>>> the >>>>>>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into >>>>>>> another. If >>>>>>> > then >>>>>>> > >>>> at >>>>>>> > >>>>>>>>>> point >>>>>>> > >>>>>>>>>>>>>> of >>>>>>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we >>>>>>> have 1 set >>>>>>> > >>> of >>>>>>> > >>>>>>>> code. >>>>>>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy >>>>>>> could be >>>>>>> > >>>> solved by >>>>>>> > >>>>>>>> a >>>>>>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are >>>>>>> de-duplicated >>>>>>> > >>>> anyway >>>>>>> > >>>>>>>>>> based >>>>>>> > >>>>>>>>>>>>>> on >>>>>>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt. >>>>>>> > >>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: >>>>>>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - >>>>>>> the >>>>>>> > >>>> reasonings >>>>>>> > >>>>>>>>>>>>>> you've >>>>>>> > >>>>>>>>>>>>>>>>> shown for that make sense to me. >>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>> Shahar >>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor >>>>>>> < >>>>>>> > >>>> a...@apache.org >>>>>>> > >>>>>>>>>>>>>>> wrote: >>>>>>> > >>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level >>>>>>> separation of >>>>>>> > >>>> Task >>>>>>> > >>>>>>>> SDK >>>>>>> > >>>>>>>>>>>>>> and >>>>>>> > >>>>>>>>>>>>>>>> Core >>>>>>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some >>>>>>> situations >>>>>>> > where >>>>>>> > >>> we >>>>>>> > >>>>>>>> would >>>>>>> > >>>>>>>>>>>>>>> like >>>>>>> > >>>>>>>>>>>>>>>> to >>>>>>> > >>>>>>>>>>>>>>>>>> share code between these. >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just >>>>>>> put it in >>>>>>> > a >>>>>>> > >>>> common >>>>>>> > >>>>>>>>>>>>>> dist >>>>>>> > >>>>>>>>>>>>>>>> they >>>>>>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of >>>>>>> the Task >>>>>>> > SDK >>>>>>> > >>>>>>>>>>>>>> separation >>>>>>> > >>>>>>>>>>>>>>>> was >>>>>>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence >>>>>>> between the >>>>>>> > >>> two, >>>>>>> > >>>>>>>>>> ideally >>>>>>> > >>>>>>>>>>>>>>>> even if >>>>>>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. >>>>>>> Most of the >>>>>>> > >>>> reason >>>>>>> > >>>>>>>>>> why >>>>>>> > >>>>>>>>>>>>>>> this >>>>>>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards >>>>>>> > >>>> compatibility - >>>>>>> > >>>>>>>> if >>>>>>> > >>>>>>>>>> we >>>>>>> > >>>>>>>>>>>>>>>> make >>>>>>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about >>>>>>> in >>>>>>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 >>>>>>> (but >>>>>>> > that >>>>>>> > >>>> covers >>>>>>> > >>>>>>>>>>>>>> some >>>>>>> > >>>>>>>>>>>>>>>> more >>>>>>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this >>>>>>> discussion >>>>>>> > >>>> such as >>>>>>> > >>>>>>>>>>>>>>>> possibly >>>>>>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a >>>>>>> single >>>>>>> > >>> provider >>>>>>> > >>>>>>>> dist.) >>>>>>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would >>>>>>> like to >>>>>>> > >>>> share >>>>>>> > >>> >>>>>>> > >>>>>>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>>>>>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to >>>>>>> share >>>>>>> > will >>>>>>> > >>>> be >>>>>>> > >>>>>>>> the >>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from >>>>>>> airflow.configuration >>>>>>> > (but >>>>>>> > >>>>>>>> notably: >>>>>>> > >>>>>>>>>>>>>>> only >>>>>>> > >>>>>>>>>>>>>>>> the >>>>>>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, >>>>>>> again, >>>>>>> > lets >>>>>>> > >>>> not >>>>>>> > >>>>>>>>>> dwell >>>>>>> > >>>>>>>>>>>>>>> on >>>>>>> > >>>>>>>>>>>>>>>> the >>>>>>> > >>>>>>>>>>>>>>>>>> specifics of that) >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue >>>>>>> here for >>>>>>> > >>>> discussion, >>>>>>> > >>>>>>>>>>>>>>> broadly >>>>>>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches: >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution >>>>>>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only >>>>>>> have the >>>>>>> > code >>>>>>> > >>>> in one >>>>>>> > >>>>>>>>>>>>>>> place. >>>>>>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case >>>>>>> of >>>>>>> > Logging >>>>>>> > >>>> config >>>>>>> > >>>>>>>>>> or >>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards >>>>>>> > compatibility >>>>>>> > >>> is >>>>>>> > >>>>>>>> much >>>>>>> > >>>>>>>>>>>>>> much >>>>>>> > >>>>>>>>>>>>>>>>>> harder. >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the >>>>>>> code is >>>>>>> > >>> released >>>>>>> > >>>>>>>>>>>>>>>> with/embedded >>>>>>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would >>>>>>> contain >>>>>>> > >>> the >>>>>>> > >>>>>>>> right >>>>>>> > >>>>>>>>>>>>>>>> version >>>>>>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The >>>>>>> > downside >>>>>>> > >>> is >>>>>>> > >>>>>>>> that >>>>>>> > >>>>>>>>>>>>>>> either >>>>>>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the >>>>>>> repo, or >>>>>>> > better >>>>>>> > >>>> yet >>>>>>> > >>>>>>>> it >>>>>>> > >>>>>>>>>>>>>>> would >>>>>>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some >>>>>>> tooling >>>>>>> > (TBD) >>>>>>> > >>>> will >>>>>>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at >>>>>>> commit >>>>>>> > >>>> time, or >>>>>>> > >>>>>>>> my >>>>>>> > >>>>>>>>>>>>>>>>>> preference, at release time. >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very >>>>>>> > strongly >>>>>>> > >>>>>>>> leaning >>>>>>> > >>>>>>>>>>>>>>>> towards >>>>>>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think >>>>>>> the >>>>>>> > >>> backwards >>>>>>> > >>>>>>>>>>>>>>>> compatibility >>>>>>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very >>>>>>> quickly over >>>>>>> > >>>> time >>>>>>> > >>>>>>>> the >>>>>>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be >>>>>>> > >>>> unreasonable) >>>>>>> > >>>>>>>>>> and I >>>>>>> > >>>>>>>>>>>>>>>> don’t >>>>>>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as >>>>>>> we need >>>>>>> > to >>>>>>> > >>>> really >>>>>>> > >>>>>>>>>>>>>>> deliver >>>>>>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to >>>>>>> delivery >>>>>>> > >>> with >>>>>>> > >>>>>>>>>> AIP-72. >>>>>>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about >>>>>>> this, I will >>>>>>> > >>>> come up >>>>>>> > >>>>>>>>>>>>>> with >>>>>>> > >>>>>>>>>>>>>>> a >>>>>>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will >>>>>>> implement code >>>>>>> > >>>> sharing >>>>>>> > >>>>>>>>>> via >>>>>>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of >>>>>>> how I >>>>>>> > can >>>>>>> > >>>>>>>> achieve >>>>>>> > >>>>>>>>>>>>>> this >>>>>>> > >>>>>>>>>>>>>>>> so >>>>>>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll >>>>>>> work >>>>>>> > there, >>>>>>> > >>>> but >>>>>>> > >>>>>>>> at >>>>>>> > >>>>>>>>>>>>>>>> runtime >>>>>>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives >>>>>>> at >>>>>>> > >>> something >>>>>>> > >>>>>>>> like >>>>>>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we >>>>>>> would end >>>>>>> > up >>>>>>> > >>>> with: >>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml >>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/src/ >>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/ >>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml >>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/src/ >>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/ >>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/src >>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/ >>>>>>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as >>>>>>> deps >>>>>>> > would >>>>>>> > >>>> be >>>>>>> > >>>>>>>>>>>>>> included >>>>>>> > >>>>>>>>>>>>>>>> in >>>>>>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD. >>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. >>>>>>> > >>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail: >>>>>>> > dev-unsubscr...@airflow.apache.org >>>>>>> > >>>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>>> > >>> dev-h...@airflow.apache.org >>>>>>> > >>>>>>>>>>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> > >>> >>>>>>> --------------------------------------------------------------------- >>>>>>> > >>>>>>>> To unsubscribe, e-mail: >>>>>>> dev-unsubscr...@airflow.apache.org >>>>>>> > >>>>>>>> For additional commands, e-mail: >>>>>>> dev-h...@airflow.apache.org >>>>>>> > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>>> --------------------------------------------------------------------- >>>>>>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> > >>>>>> For additional commands, e-mail: >>>>>>> dev-h...@airflow.apache.org >>>>>>> > >>>>>> >>>>>>> > >>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>> > >>>>> >>>>>>> > >>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>> > >>>> >>>>>>> > >>>> >>>>>>> > > >>>>>>> > > >>>>>>> --------------------------------------------------------------------- >>>>>>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> > > For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>> > > >>>>>>> > >>>>>>> > >>>>>>> --------------------------------------------------------------------- >>>>>>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> > For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>> > >>>>>>> > >>>>>>> >>>>>>