Not quite final PR, but good enough that I want to see how it behaves on CI, and other IDEs etc https://github.com/apache/airflow/pull/53149 (We updated the `setup_idea.py`, so either re-run that or add the new src root manually)
Let the naming discussions start! -ash > On 10 Jul 2025, at 12:51, Jarek Potiuk <ja...@potiuk.com> wrote: > > Yeah. when we get the final PR I will also want to to test more scenarios - > with IDE/mypy integration switching branches, uv syncing etc. and will be > happy to help and document the contributor's doc to explain what and how to > work with it. This would be a super cool thing if we get it to > work seamlessly for everyone :) > > On Thu, Jul 10, 2025 at 7:44 AM Amogh Desai <amoghdesai....@gmail.com> > wrote: > >> Agreed! >> >> Once the PR is up, we can have these implementation level discussions >> over there. Good chat however! >> >> Thanks & Regards, >> Amogh Desai >> >> >> On Wed, Jul 9, 2025 at 3:56 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> Yeah. I think extracting one-by-one, feature-by-feature that we want to >>> share to a separate distribution is the best approach - it will actually >>> also help with the "__init__.py" cleanup - because almost by definition - >>> those distributions will not be able to "reach" outside - i.e. they only >>> can be "used" not "use" something else. Which means that - for example as >>> it is now - configuration using logging which is using configuration >>> (leading to circular dependencies and partially initialized modules) will >>> not happen and we will have to figure out rather how to inject >>> configuration into logging (from task-sdk, airflow-ctl, airflow-core) and >>> to get the right sequence of initialization - rather than have the >>> inter-feature-dependencies. >>> >>> And that is precisely what low-cyclomatic complexity is about as well - >>> it generally leads to easier-to-maintain software that has well defined >>> functionality and does not have those weird circular dependencies we have >>> now. That's a kind-of side-effect of such a "per feature" split, but a very >>> desirable one. >>> >>> J. >>> >>> On Wed, Jul 9, 2025 at 10:17 AM Amogh Desai <amoghdesai....@gmail.com> >>> wrote: >>> >>>> Probably, you make a valid point. >>>> >>>> Maybe this is an implementation detail, so we could figure it out as we >>>> start on a POC and factor in these things >>>> as we move along? >>>> >>>> But from an initial guess, I would think that execution time related >>>> items (if we manage to enumerate them) would be something >>>> that would be better off in that "core_and_task_sdk" bundle. >>>> >>>> Thanks & Regards, >>>> Amogh Desai >>>> >>>> >>>> On Tue, Jul 8, 2025 at 3:46 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>> >>>>>> Not that I am against your idea and we can surely expand as we need >>>>> but we would not need to expand the >>>>> "core_and_task_sdk" if we put only the relevant items into it. >>>>> >>>>> So if we move logging and config out, my question is what is really >>>>> relevant to "stay" in "core_and_task_sdk" ? And what we know will not be >>>>> needed by other distributions in the future ? >>>>> >>>>> What would be the content of "core_and_task_sdk" ? Can we enumerate >>>>> what should go there now ? >>>>> >>>>> My bet is that if we enumerate them, it will turn out that they are >>>>> good candidates to make separate "shared features" that can be logically >>>>> named and modularised, and that there is no real need to have shared >>>>> "core_and_task_sdk" which sounds like "bag of everything else". >>>>> >>>>> >>>>> >>>>> On Tue, Jul 8, 2025 at 11:48 AM Amogh Desai <amoghdesai....@gmail.com> >>>>> wrote: >>>>> >>>>>> Yeah, I think what you are showcasing here is a step ahead of the >>>>>> initial proposal from Ash. >>>>>> >>>>>> From the original proposal, the `core_and_task_sdk` *can* have the >>>>>> things relevant to just those two >>>>>> distros. Logging, Config are modules that might be needed by >>>>>> airflow-ctl for example, so ideally, those >>>>>> would not be good candidates to be put in there, ideally speaking. >>>>>> >>>>>> The example of Kubernetes Utils sounds to be a good example, it will >>>>>> be used by KubeExecutor (lets say this is a >>>>>> module called "executors") and by KPO (providers), and the >>>>>> "shared/kubernetes" would probably be a good >>>>>> candidate for that. >>>>>> >>>>>> Not that I am against your idea and we can surely expand as we need >>>>>> but we would not need to expand the >>>>>> "core_and_task_sdk" if we put only the relevant items into it. >>>>>> >>>>>> Thanks & Regards, >>>>>> Amogh Desai >>>>>> >>>>>> >>>>>> On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>> >>>>>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you >>>>>>> mean there, I am understanding the direction >>>>>>> but could you elaborate a bit more please? >>>>>>> >>>>>>> Let me elaborate: >>>>>>> >>>>>>> As I understand (maybe I am wrong?), the proposal is that we have a >>>>>>> "core-and-task-sdk" folder which is a shared distribution that is >>>>>>> vendored-in into both "airflow-core" and "airflow-task-sdk". This >>>>>>> contains some shared code that we want to include in both distributions >>>>>>> (note that we never ever release "core-and-task-sdk" >>>>>>> distribution because it only contains code that is shared between the >>>>>>> two >>>>>>> distributions we release. >>>>>>> >>>>>>> That's fine and cool. >>>>>>> >>>>>>> Imagine that this distribution contains "logging" (shared code >>>>>>> for logging) and "config" (shared code for configuration). - both >>>>>>> needed in >>>>>>> "airflow-core" and "airflow-task-sdk". So far so good. But what happen >>>>>>> if >>>>>>> we want to use logging in the same fashion in say "airflow-ctl" (that >>>>>>> also >>>>>>> applies for other distributions that we might come up with) ? Are we >>>>>>> going >>>>>>> to vendor in the whole "core-and-task-sdk" distribution in >>>>>>> "airflow-ctl" ? >>>>>>> It would be far better if we just vendor in "logging" and do not >>>>>>> vendor-in >>>>>>> "config". >>>>>>> >>>>>>> And if we are going to have a mechanism to vendor-in "a distribution" >>>>>>> - there is nothing wrong with having the same way to vendor-in multiple >>>>>>> distributions - so we can easily do it this way (i added "orm" , >>>>>>> "serialization". and "fast_api" as an example thing that we might want >>>>>>> to >>>>>>> share - not sure if that is really something we want to do but it will >>>>>>> allow to illustrate my idea better) >>>>>>> >>>>>>> / >>>>>>> airflow-ctl/ >>>>>>> task-sdk/... >>>>>>> airflow-core/... >>>>>>> .... >>>>>>> shared/ >>>>>>> kubernetes/ >>>>>>> pyproject.toml >>>>>>> src/ >>>>>>> airflow_shared_kubernetes/__init__.py >>>>>>> logging/ >>>>>>> pyproject.toml >>>>>>> src/ >>>>>>> airflow_shared_logging/__init__.py >>>>>>> config/ >>>>>>> pyproject.toml >>>>>>> src/ >>>>>>> airflow_shared_config/__init__.py >>>>>>> orm/ >>>>>>> pyproject.toml >>>>>>> src/ >>>>>>> airflow_shared_orm/__init__.py >>>>>>> serialization/ >>>>>>> pyproject.toml >>>>>>> src/ >>>>>>> airflow_shared_serialization/__init__.py >>>>>>> fast_api/ >>>>>>> pyproject.toml >>>>>>> src/ >>>>>>> airflow_shared_fast_api/__init__.py >>>>>>> ... >>>>>>> >>>>>>> This has multiple benefits (and I see no real drawbacks): >>>>>>> >>>>>>> * the code can be really well modularised. Those "things" we share >>>>>>> (and this also connects to the __init__.py discussion) - can be >>>>>>> independent >>>>>>> - and (it follow Jens comment) it allows to keep low cyclomatic >>>>>>> complexity >>>>>>> https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way >>>>>>> easier to implement logging in the way that it does not import or use >>>>>>> config. This means for example that configuration for logging will need >>>>>>> to >>>>>>> be injected when logging is initialized - and that's exactly what we >>>>>>> want, >>>>>>> we do not want logging code to use configuration code directly - they >>>>>>> should be independent from each other and you should be free to >>>>>>> vendor-in >>>>>>> either logging or config independently if you also vendored-in the >>>>>>> other. >>>>>>> >>>>>>> * it's much more logical. We split based on functionality we want to >>>>>>> share - not about the "distributions" we want to produce. That allows >>>>>>> us - >>>>>>> in the future - to make different decisions on how we split our >>>>>>> distributions. For example (i do not tell we have to do it, or that we >>>>>>> will >>>>>>> do it but we will have such a possibility) - we can add more shared >>>>>>> utilities we find useful in the same way and decide that "scheduler", >>>>>>> "api_server" or "scheduler" or "triggerer" or "dag processor" are split >>>>>>> to >>>>>>> separate distributions - because for example we want to keep a number of >>>>>>> dependencies down. And for example "api_server" might use "fast_api", >>>>>>> "config", "logging", "orm" and "fast_api" and "serialization" , where >>>>>>> the >>>>>>> scheduler should not need "fast_api". The "dag_processor" eventually >>>>>>> might >>>>>>> not need "orm" nor "fast_api" and only use the other >>>>>>> >>>>>>> This seems like a natural approach. If we have a mechanism to "share" >>>>>>> the code, it does not add complexity, but allows us to isolate >>>>>>> independent >>>>>>> functionality into "isolated" boxes and use them >>>>>>> >>>>>>> Also for cyclomatic complexity that is a complex word (badly chosen >>>>>>> as it scares people away) and has some math behind, but it really boils >>>>>>> down to very simple "rules of thumb" (and yes I am a big proponent of >>>>>>> having low cyclomatic complexity). >>>>>>> >>>>>>> a) when you are building the "final" product (i.e. distribution you >>>>>>> want to release) - make sure that you only "use" things - that nothing >>>>>>> else >>>>>>> is "using" you as a library. >>>>>>> b) when you are building a "shared" thing (a library) - make sure >>>>>>> that library is only "used" by others but it does not really "use" >>>>>>> anything >>>>>>> else. For example in the case I explained above - we can achieve >>>>>>> low-cyclomatic complexity when: >>>>>>> >>>>>>> * airflow-core uses: logging, config, orm, serialization, fast_api >>>>>>> * none of the "logging, config, orm, serialization, fast_api" use >>>>>>> each other - they are at the bottom of the "user -> used" tree >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I like the folder structure proposed by Ash and have no objections >>>>>>>> with it. >>>>>>>> >>>>>>>> "core_and_task_sdk" sounds good to me and justifies what it should >>>>>>>> do pretty well. >>>>>>>> >>>>>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean >>>>>>>> there, I am understanding the direction >>>>>>>> but could you elaborate a bit more please? >>>>>>>> >>>>>>>> Naming is REALLY hard! >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Amogh Desai >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> How about splitting it even more and having each shared "thing" >>>>>>>>> named? >>>>>>>>> "logging", "config" and sharing them explicitly and separately with >>>>>>>>> the >>>>>>>>> right "user" ? >>>>>>>>> That sounds way more modular and we will be able to choose which >>>>>>>>> of the >>>>>>>>> shared "utils" we use where. >>>>>>>>> >>>>>>>>> J. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler >>>>>>>>> <j_scheff...@gmx.de.invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I like "core_and_task_sdk the same like core-and-task-sdk - I >>>>>>>>> have no >>>>>>>>>> problem and it is a path only. >>>>>>>>>> >>>>>>>>>> if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which >>>>>>>>> is a >>>>>>>>>> bit bulky we then should name it "all-not-api-server" :-D >>>>>>>>>> >>>>>>>>>> On 07.07.25 22:57, Ash Berlin-Taylor wrote: >>>>>>>>>>> In case I did a bad job explaining it, the “core and task sdk” >>>>>>>>> is not in >>>>>>>>>> the module name/import name, just in the file path. >>>>>>>>>>> >>>>>>>>>>> Anyone have other ideas? >>>>>>>>>>> >>>>>>>>>>>> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks Ash! Looks cool! I like the structure. This will >>>>>>>>> enable all the >>>>>>>>>>>> combinations and structure looks easy to grasp. No strong >>>>>>>>> stance on the >>>>>>>>>>>> naming other than maybe it is a bit long with `and`, >>>>>>>>> `core_ctl` could be >>>>>>>>>>>> shorter, since no import path is defined like that, we can >>>>>>>>> give any name >>>>>>>>>>>> for sure. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> >>>>>>>>>>>>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Looks good but I think we should find some better logical >>>>>>>>> name for >>>>>>>>>>>>> core_and_sdk :) >>>>>>>>>>>>> >>>>>>>>>>>>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler >>>>>>>>>>>>> <j_scheff...@gmx.de.invalid> napisał: >>>>>>>>>>>>> >>>>>>>>>>>>>> Cool! Especially the "shared" folder with the ability to have >>>>>>>>>>>>>> N-combinations w/o exploding project repo root! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: >>>>>>>>>>>>>>> Oh, and all of this will be explain in shared/README.md >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> >>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Okay, so it seems we have agreement on the approach here, >>>>>>>>> so I’ll >>>>>>>>>>>>>> continue with this, and on the dev call it was mentioned that >>>>>>>>>>>>>> “airflow-common” wasn’t a great name, so here is my proposal >>>>>>>>> for the >>>>>>>>>> file >>>>>>>>>>>>>> structure; >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>> task-sdk/... >>>>>>>>>>>>>>>> airflow-core/... >>>>>>>>>>>>>>>> shared/ >>>>>>>>>>>>>>>> kuberenetes/ >>>>>>>>>>>>>>>> pyproject.toml >>>>>>>>>>>>>>>> src/ >>>>>>>>>>>>>>>> airflow_kube/__init__.py >>>>>>>>>>>>>>>> core-and-tasksdk/ >>>>>>>>>>>>>>>> pyproject.toml >>>>>>>>>>>>>>>> src/ >>>>>>>>>>>>>>>> airflow_shared/__init__.py >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Things to note here: the “shared” folder has (the >>>>>>>>> possibility) of >>>>>>>>>>>>>> having multiple different shared “libraries” in it, in this >>>>>>>>> example I >>>>>>>>>> am >>>>>>>>>>>>>> supposing a hypothetical shared kuberenetes folder a world >>>>>>>>> in which we >>>>>>>>>>>>>> split the KubePodOperator and the KubeExecutor in to two >>>>>>>>> separate >>>>>>>>>>>>>> distributions (example only, not proposing we do that right >>>>>>>>> now, and >>>>>>>>>> that >>>>>>>>>>>>>> will be a separate discussion) >>>>>>>>>>>>>>>> The other things to note here: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - the folder name in shared aims to be “self-documenting”, >>>>>>>>> hence the >>>>>>>>>>>>>> verbose “core-and-tasksdk” to say where the shared library is >>>>>>>>>> intended to >>>>>>>>>>>>>> be used. >>>>>>>>>>>>>>>> - the python module itself should almost always have an >>>>>>>>> `airflow_` >>>>>>>>>> (or >>>>>>>>>>>>>> maybe `_airflow_`?) prefix so that it does not conflict with >>>>>>>>> anything >>>>>>>>>>>>> else >>>>>>>>>>>>>> we might use. It won’t matter “in production” as those will >>>>>>>>> be >>>>>>>>>> vendored >>>>>>>>>>>>> in >>>>>>>>>>>>>> to be imported as `airflow/_vendor/airflow_shared` etc, but >>>>>>>>> avoiding >>>>>>>>>>>>>> conflicts at dev time with the Finder approach is a good >>>>>>>>> safety >>>>>>>>>> measure. >>>>>>>>>>>>>>>> I will start making a real PR for this proposal now, but >>>>>>>>> I’m open to >>>>>>>>>>>>>> feedback (either here, or in the PR when I open it) >>>>>>>>>>>>>>>> -ash >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> >>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yeah we have to try it and test - also building packages >>>>>>>>> happens >>>>>>>>>> semi >>>>>>>>>>>>>>>>> frequently when you run `uv sync` (they use some kind of >>>>>>>>> heuristics >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> decide when) and you can force it with `--reinstall` or >>>>>>>>>> `--refresh`. >>>>>>>>>>>>>>>>> Package build also happens every time when you run >>>>>>>>> "ci-image build` >>>>>>>>>>>>>> now in >>>>>>>>>>>>>>>>> breeze so it seems like it will nicely integrate in our >>>>>>>>> workflows. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looks really cool Ash. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor < >>>>>>>>> a...@apache.org> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> It’s not just release time, but any time we build a >>>>>>>>> package which >>>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>> on “every” CI run. The normal unit tests will use code >>>>>>>>> from >>>>>>>>>>>>>>>>>> airflow-common/src/airflow_common; the kube tests which >>>>>>>>> build an >>>>>>>>>>>>>> image will >>>>>>>>>>>>>>>>>> build the dists and vendor in the code from that commit. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There is only a single copy of the shared code committed >>>>>>>>> to the >>>>>>>>>>>>> repo, >>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>>> there is never anything to synchronise. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai < >>>>>>>>> amoghdesai....@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> Thanks Ash. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> This is really cool and helpful that you were able to >>>>>>>>> test both >>>>>>>>>>>>>> scenarios >>>>>>>>>>>>>>>>>>> -- repo checkout >>>>>>>>>>>>>>>>>>> and also installing from the vendored package and the >>>>>>>>> resolution >>>>>>>>>>>>>> worked >>>>>>>>>>>>>>>>>>> fine too. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I like this idea compared the to relative import one >>>>>>>>> for few >>>>>>>>>>>>> reasons: >>>>>>>>>>>>>>>>>>> - It feels like it will take some time to adjust to the >>>>>>>>> new >>>>>>>>>> coding >>>>>>>>>>>>>>>>>> standard >>>>>>>>>>>>>>>>>>> that we will lay >>>>>>>>>>>>>>>>>>> if we impose relative imports in the shared dist >>>>>>>>>>>>>>>>>>> - We can continue using repo wise absolute import >>>>>>>>> standards, it >>>>>>>>>> is >>>>>>>>>>>>>> also >>>>>>>>>>>>>>>>>>> much easier for situations >>>>>>>>>>>>>>>>>>> when we do global search in IDE to find + replace, this >>>>>>>>> could >>>>>>>>>> mean >>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> change >>>>>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>>> - The vendoring work is a proven and established >>>>>>>>> paradigm across >>>>>>>>>>>>>> projects >>>>>>>>>>>>>>>>>>> and would >>>>>>>>>>>>>>>>>>> out of box give us the build tooling we need also >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Nothing too against the relative import but with the >>>>>>>>> evidence >>>>>>>>>>>>>> provided >>>>>>>>>>>>>>>>>>> above, vendored approach >>>>>>>>>>>>>>>>>>> seems to only do us good. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regarding synchronizing it, release time should be fine >>>>>>>>> as long >>>>>>>>>> as >>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> a good CI workflow to probably >>>>>>>>>>>>>>>>>>> catch such issues per PR if changes are made in shared >>>>>>>>> dist? >>>>>>>>>>>>>> (precommit >>>>>>>>>>>>>>>>>>> would make it really slow i guess) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If we can run our tests with vendored code we should be >>>>>>>>> mostly >>>>>>>>>>>>>> covered. >>>>>>>>>>>>>>>>>>> Good effort all! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks & Regards, >>>>>>>>>>>>>>>>>>> Amogh Desai >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor < >>>>>>>>>> a...@apache.org> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> Okay, I think I’ve got something that works and I’m >>>>>>>>> happy with. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core >>>>>>>>>>>>>>>>>>>> This produces the following from `uv build task-sdk` >>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip >>>>>>>>>>>>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will >>>>>>>>> .zip) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> ❯ unzip -l >>>>>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | >>>>>>>>>>>>>>>>>> grep >>>>>>>>>>>>>>>>>>>> _vendor >>>>>>>>>>>>>>>>>>>> 50 02-02-2020 00:00 >>>>>>>>> airflow/sdk/_vendor/.gitignore >>>>>>>>>>>>>>>>>>>> 2082 02-02-2020 00:00 >>>>>>>>> airflow/sdk/_vendor/__init__.py >>>>>>>>>>>>>>>>>>>> 28 02-02-2020 00:00 >>>>>>>>>> airflow/sdk/_vendor/airflow_common.pyi >>>>>>>>>>>>>>>>>>>> 18 02-02-2020 00:00 >>>>>>>>> airflow/sdk/_vendor/vendor.txt >>>>>>>>>>>>>>>>>>>> 785 02-02-2020 00:00 >>>>>>>>>>>>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>>>>>>>>>>>>>>> 10628 02-02-2020 00:00 >>>>>>>>>>>>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is >>>>>>>>> complete too: >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>>>> |grep >>>>>>>>>> _vendor >>>>>>>>>>>>>>>>>>>> >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi >>>>>>>>>>>>>>>>>>>> >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The plugin works at build time by including/copying >>>>>>>>> the libs >>>>>>>>>>>>>> specified >>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> vendor.txt into place (and let `vendoring` take care >>>>>>>>> of import >>>>>>>>>>>>>>>>>> rewrites.) >>>>>>>>>>>>>>>>>>>> For the imports to continue to work at “dev” time/from >>>>>>>>> a repo >>>>>>>>>>>>>> checkout, >>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>> have added a import finder to `sys.meta_path`, and >>>>>>>>> since it’s at >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> end of >>>>>>>>>>>>>>>>>>>> the list it will only be used if the normal import >>>>>>>>> can’t find >>>>>>>>>>>>>> things. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py >>>>>>>>>>>>>>>>>>>> This doesn’t quite give us the same runtime effect >>>>>>>>> “import >>>>>>>>>>>>>> rewriting” >>>>>>>>>>>>>>>>>>>> affect, as in this approach `airflow_common` is >>>>>>>>> directly loaded >>>>>>>>>>>>>> (i.e. >>>>>>>>>>>>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common >>>>>>>>> exist in >>>>>>>>>>>>>>>>>>>> sys.modules), but it does work for everything that I >>>>>>>>> was able to >>>>>>>>>>>>>> test.. >>>>>>>>>>>>>>>>>>>> I tested it with the diff at the end of this message. >>>>>>>>> My test >>>>>>>>>>>>>> ipython >>>>>>>>>>>>>>>>>>>> shell: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> In [1]: from >>>>>>>>> airflow.sdk._vendor.airflow_common.timezone import >>>>>>>>>>>>> foo >>>>>>>>>>>>>>>>>>>> In [2]: foo >>>>>>>>>>>>>>>>>>>> Out[2]: 1 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In [4]: import >>>>>>>>> airflow.sdk._vendor.airflow_common.timezone >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ >>>>>>>>>>>>>>>>>>>> Out[5]: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' >>>>>>>>>>>>>>>>>>>> In [6]: >>>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>>>>>>>>>>>>>>> Out[6]: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> And in an standalone environment with the SDK dist I >>>>>>>>> built (it >>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> matching airflow-core right now, but that is nothing >>>>>>>>> to do with >>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>> discussion): >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with >>>>>>>>>>>>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with >>>>>>>>>>>>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl >>>>>>>>> ipython >>>>>>>>>>>>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang >>>>>>>>> 18.1.8 ] >>>>>>>>>>>>>>>>>>>> Type 'copyright', 'credits' or 'license' for more >>>>>>>>> information >>>>>>>>>>>>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type >>>>>>>>> '?' for >>>>>>>>>>>>> help. >>>>>>>>>>>>>>>>>>>> Tip: You can use `%hist` to view history, see the >>>>>>>>> options with >>>>>>>>>>>>>>>>>> `%history?` >>>>>>>>>>>>>>>>>>>> In [1]: import >>>>>>>>> airflow.sdk._vendor.airflow_common.timezone >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In [2]: >>>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>>>>>>>>>>>>>>> Out[2]: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ >>>>>>>>>>>>>>>>>>>> `` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ```diff >>>>>>>>>>>>>>>>>>>> diff --git >>>>>>>>> a/airflow-common/src/airflow_common/__init__.py >>>>>>>>>>>>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py >>>>>>>>>>>>>>>>>>>> index 13a83393a9..927b7c6b61 100644 >>>>>>>>>>>>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py >>>>>>>>>>>>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py >>>>>>>>>>>>>>>>>>>> @@ -14,3 +14,5 @@ >>>>>>>>>>>>>>>>>>>> # KIND, either express or implied. See the License >>>>>>>>> for the >>>>>>>>>>>>>>>>>>>> # specific language governing permissions and >>>>>>>>> limitations >>>>>>>>>>>>>>>>>>>> # under the License. >>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>> +foo = 1 >>>>>>>>>>>>>>>>>>>> diff --git >>>>>>>>> a/airflow-common/src/airflow_common/timezone.py >>>>>>>>>>>>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py >>>>>>>>>>>>>>>>>>>> index 340b924c66..58384ef20f 100644 >>>>>>>>>>>>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py >>>>>>>>>>>>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py >>>>>>>>>>>>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = >>>>>>>>>>>>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 >>>>>>>>>>>>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 >>>>>>>>>>>>>>>>>>>> utc = pendulum.UTC >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>> +from airflow_common import foo >>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>> TIMEZONE: Timezone >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk < >>>>>>>>> ja...@potiuk.com> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> I think both approaches are doable: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1) -> We can very easily prevent bad imports by >>>>>>>>> pre-commit when >>>>>>>>>>>>>>>>>> importing >>>>>>>>>>>>>>>>>>>>> from different distributions and make sure we are >>>>>>>>> only doing >>>>>>>>>>>>>> relative >>>>>>>>>>>>>>>>>>>>> imports in the shared modules. We are doing plenty of >>>>>>>>> this >>>>>>>>>>>>>> already. And >>>>>>>>>>>>>>>>>>>> yes >>>>>>>>>>>>>>>>>>>>> it would require relative links we currently do not >>>>>>>>> allow. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2) -> has one disadvantage that someone at some point >>>>>>>>> in time >>>>>>>>>>>>> will >>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> decide to synchronize this and if it happens just >>>>>>>>> before >>>>>>>>>> release >>>>>>>>>>>>>> (I bet >>>>>>>>>>>>>>>>>>>>> this is going to happen) this will lead to solving >>>>>>>>> problems >>>>>>>>>> that >>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>>> normally be solved during PR when you make a change >>>>>>>>> (i.e. >>>>>>>>>>>>> symbolic >>>>>>>>>>>>>> link >>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>> the advantage that whoever modifies shared code will >>>>>>>>> be >>>>>>>>>>>>> immediately >>>>>>>>>>>>>>>>>>>>> notified in their PR - that they broke something >>>>>>>>> because either >>>>>>>>>>>>>> static >>>>>>>>>>>>>>>>>>>>> checks or mypy or tests fail. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Ash, do you have an idea of a process (who and when) >>>>>>>>> does the >>>>>>>>>>>>>>>>>>>>> synchronisation in case of vendoring? Maybe we could >>>>>>>>> solve it >>>>>>>>>> if >>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>>>>> done >>>>>>>>>>>>>>>>>>>>> more frequently and with some regularity? We could >>>>>>>>> potentially >>>>>>>>>>>>>> force >>>>>>>>>>>>>>>>>>>>> re-vendoring at PR time as well any time shared code >>>>>>>>> changes >>>>>>>>>> (and >>>>>>>>>>>>>>>>>> prevent >>>>>>>>>>>>>>>>>>>>> it by pre-commit. And I can't think of some place >>>>>>>>> (other than >>>>>>>>>>>>>> releases) >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>> our development workflow and that seems to be a bit >>>>>>>>> too late as >>>>>>>>>>>>>> puts an >>>>>>>>>>>>>>>>>>>>> extra effort on fixing potential incompatibilities >>>>>>>>> introduced >>>>>>>>>> on >>>>>>>>>>>>>>>>>> release >>>>>>>>>>>>>>>>>>>>> manager and delays the release. WDYT? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Re: relative links. I think for a shared library we >>>>>>>>> could >>>>>>>>>>>>>> potentially >>>>>>>>>>>>>>>>>>>> relax >>>>>>>>>>>>>>>>>>>>> this and allow them (and actually disallow absolute >>>>>>>>> links in >>>>>>>>>> the >>>>>>>>>>>>>> pieces >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> code that are shared - again, by pre-commit). As I >>>>>>>>> recall, the >>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>>>>> we forbade the relative links is because of how we >>>>>>>>> are (or >>>>>>>>>> maybe >>>>>>>>>>>>>> were) >>>>>>>>>>>>>>>>>>>>> doing DAG parsing and failures resulting from it. So >>>>>>>>> we decided >>>>>>>>>>>>> to >>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>> allow it to keep consistency. The way how Dag parsing >>>>>>>>> works is >>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>> you are using importlib to read the Dag from a file, >>>>>>>>> the >>>>>>>>>> relative >>>>>>>>>>>>>>>>>> imports >>>>>>>>>>>>>>>>>>>>> do not work as it does not know what they should be >>>>>>>>> relative >>>>>>>>>> to. >>>>>>>>>>>>>> But if >>>>>>>>>>>>>>>>>>>>> relative import is done from an imported package, it >>>>>>>>> should be >>>>>>>>>> no >>>>>>>>>>>>>>>>>>>> problem, >>>>>>>>>>>>>>>>>>>>> I think - otherwise our Dags would not be able to >>>>>>>>> import any >>>>>>>>>>>>>> library >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>> uses relative imports. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Of course consistency might be the reason why we do >>>>>>>>> not want to >>>>>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>>>>>>> relative imports. I don't see it as an issue if it is >>>>>>>>> guarded >>>>>>>>>> by >>>>>>>>>>>>>>>>>>>> pre-commit >>>>>>>>>>>>>>>>>>>>> though. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> J. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> J. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < >>>>>>>>>>>>>> a...@apache.org> >>>>>>>>>>>>>>>>>>>>> napisał: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It >>>>>>>>> does mean >>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>> can’t >>>>>>>>>>>>>>>>>>>>>> use absolute imports in one common module to another. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> For example >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 >>>>>>>>>>>>>>>>>>>>>> where we have >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>> from airflow.utils.module_loading import >>>>>>>>> import_string >>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> if we want to move serve_logs into this common lib >>>>>>>>> that is >>>>>>>>>> then >>>>>>>>>>>>>>>>>>>> symlinked >>>>>>>>>>>>>>>>>>>>>> then we wouldn’t be able to have `from >>>>>>>>>>>>>> airflow_common.module_loading >>>>>>>>>>>>>>>>>>>> import >>>>>>>>>>>>>>>>>>>>>> import_string`. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I can think of two possible solutions here. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1) is to allow/require relative imports in this >>>>>>>>> shared lib, so >>>>>>>>>>>>>> `from >>>>>>>>>>>>>>>>>>>>>> .module_loading import import_string` >>>>>>>>>>>>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip >>>>>>>>> maintainers) which >>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>> handle >>>>>>>>>>>>>>>>>>>>>> import-rewriting for us. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a >>>>>>>>> thing, so I >>>>>>>>>>>>>> prepared >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>> minimal POC/demo of what vendoring approach could >>>>>>>>> look like >>>>>>>>>> here >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 >>>>>>>>>>>>>>>>>>>>>> Now personally I am more than happy with relative >>>>>>>>> imports, but >>>>>>>>>>>>>>>>>> generally >>>>>>>>>>>>>>>>>>>>>> as a project we have avoided them, so I think that >>>>>>>>> limits what >>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>>>>> with a symlink based approach. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -ash >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < >>>>>>>>>>>>>>>>>> gopidesupa...@gmail.com> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> Thanks Ash >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes agree option 2 would be preferred for me. >>>>>>>>> Making sure we >>>>>>>>>>>>>> have all >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in >>>>>>>>> code sharing >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> executing >>>>>>>>>>>>>>>>>>>>>>> right of tests between the packages. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Agree with others, option 2 would be >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < >>>>>>>>>>>>>>>>>> amoghdesai....@gmail.com> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for starting this discussion, Ash. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I would prefer option 2 here with proper tooling >>>>>>>>> to handle >>>>>>>>>> the >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>> duplication at *release* time. >>>>>>>>>>>>>>>>>>>>>>>> It is best to have a dist that has all it needs in >>>>>>>>> itself. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if >>>>>>>>> we decide >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> separate >>>>>>>>>>>>>>>>>>>>>>>> triggerer / >>>>>>>>>>>>>>>>>>>>>>>> dag processor / config etc etc as separate >>>>>>>>> packages, back >>>>>>>>>>>>>> compat is >>>>>>>>>>>>>>>>>>>>>> going >>>>>>>>>>>>>>>>>>>>>>>> to be a nightmare >>>>>>>>>>>>>>>>>>>>>>>> and will bite us harder than we anticipate. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards, >>>>>>>>>>>>>>>>>>>>>>>> Amogh Desai >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < >>>>>>>>>>>>> kaxiln...@gmail.com> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of >>>>>>>>> dependencies >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler >>>>>>>>>>>>>>>>>>>> <j_scheff...@gmx.de.invalid >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is >>>>>>>>> it is >>>>>>>>>>>>> rather >>>>>>>>>>>>>>>>>>>>>> pragmatic >>>>>>>>>>>>>>>>>>>>>>>>>> and we no not need to cut another package and >>>>>>>>> have less >>>>>>>>>>>>>> package >>>>>>>>>>>>>>>>>>>> counts >>>>>>>>>>>>>>>>>>>>>>>>>> and dependencies. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I remember some time ago I was checking >>>>>>>>> (together with >>>>>>>>>>>>> Jarek, >>>>>>>>>>>>>> I am >>>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would >>>>>>>>> be >>>>>>>>>> possible. >>>>>>>>>>>>>> To >>>>>>>>>>>>>>>>>> keep >>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>> source in one package but "symlink" it into >>>>>>>>> another. If >>>>>>>>>> then >>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> point >>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>> packaging/release the files are materialized we >>>>>>>>> have 1 set >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy >>>>>>>>> could be >>>>>>>>>>>>>> solved by >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are >>>>>>>>> de-duplicated >>>>>>>>>>>>>> anyway >>>>>>>>>>>>>>>>>>>> based >>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>> content hash, so this does not hurt. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI >>>>>>>>> - the >>>>>>>>>>>>>> reasonings >>>>>>>>>>>>>>>>>>>>>>>> you've >>>>>>>>>>>>>>>>>>>>>>>>>>> shown for that make sense to me. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Shahar >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash >>>>>>>>> Berlin-Taylor < >>>>>>>>>>>>>> a...@apache.org >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> As we work on finishing off the code-level >>>>>>>>> separation of >>>>>>>>>>>>>> Task >>>>>>>>>>>>>>>>>> SDK >>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> Core >>>>>>>>>>>>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some >>>>>>>>> situations >>>>>>>>>> where >>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>> share code between these. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just >>>>>>>>> put it in >>>>>>>>>> a >>>>>>>>>>>>>> common >>>>>>>>>>>>>>>>>>>>>>>> dist >>>>>>>>>>>>>>>>>>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of >>>>>>>>> the Task >>>>>>>>>> SDK >>>>>>>>>>>>>>>>>>>>>>>> separation >>>>>>>>>>>>>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>>>>>>>>>>>>> to have 100% complete version independence >>>>>>>>> between the >>>>>>>>>>>>> two, >>>>>>>>>>>>>>>>>>>> ideally >>>>>>>>>>>>>>>>>>>>>>>>>> even if >>>>>>>>>>>>>>>>>>>>>>>>>>>> they are built into the same image and venv. >>>>>>>>> Most of the >>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>>>> why >>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards >>>>>>>>>>>>>> compatibility - >>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>>>>>>>>>>>>> an change to the common/shared distribution >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about >>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 >>>>>>>>> (but >>>>>>>>>> that >>>>>>>>>>>>>> covers >>>>>>>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this >>>>>>>>> discussion >>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>>>>>>>>> possibly >>>>>>>>>>>>>>>>>>>>>>>>>>>> separating operators and executors out of a >>>>>>>>> single >>>>>>>>>>>>> provider >>>>>>>>>>>>>>>>>> dist.) >>>>>>>>>>>>>>>>>>>>>>>>>>>> To give a concrete example of some code I >>>>>>>>> would like to >>>>>>>>>>>>>> share >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>>>>>>>>>>>>>>>>>>>>>>>>>>> — logging config. Another thing we will want >>>>>>>>> to share >>>>>>>>>> will >>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> AirflowConfigParser class from >>>>>>>>> airflow.configuration >>>>>>>>>> (but >>>>>>>>>>>>>>>>>> notably: >>>>>>>>>>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, >>>>>>>>> again, >>>>>>>>>> lets >>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>> dwell >>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> specifics of that) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> So to bring the options listed in the issue >>>>>>>>> here for >>>>>>>>>>>>>> discussion, >>>>>>>>>>>>>>>>>>>>>>>>> broadly >>>>>>>>>>>>>>>>>>>>>>>>>>>> speaking there are two high-level approaches: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. A single shared distribution >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only >>>>>>>>> have the >>>>>>>>>> code >>>>>>>>>>>>>> in one >>>>>>>>>>>>>>>>>>>>>>>>> place. >>>>>>>>>>>>>>>>>>>>>>>>>>>> However for me, at least in this specific case >>>>>>>>> of >>>>>>>>>> Logging >>>>>>>>>>>>>> config >>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards >>>>>>>>>> compatibility >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> much >>>>>>>>>>>>>>>>>>>>>>>> much >>>>>>>>>>>>>>>>>>>>>>>>>>>> harder. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the >>>>>>>>> code is >>>>>>>>>>>>> released >>>>>>>>>>>>>>>>>>>>>>>>>> with/embedded >>>>>>>>>>>>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk >>>>>>>>> would contain >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> right >>>>>>>>>>>>>>>>>>>>>>>>>> version >>>>>>>>>>>>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). >>>>>>>>> The >>>>>>>>>> downside >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>> either >>>>>>>>>>>>>>>>>>>>>>>>>>>> the code will need to be duplicated in the >>>>>>>>> repo, or >>>>>>>>>> better >>>>>>>>>>>>>> yet >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>>>>>>>>>> live in a single place in the repo, but some >>>>>>>>> tooling >>>>>>>>>> (TBD) >>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically handle the duplication, either >>>>>>>>> at commit >>>>>>>>>>>>>> time, or >>>>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>>>>>>>>>>>>> preference, at release time. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am >>>>>>>>> very >>>>>>>>>> strongly >>>>>>>>>>>>>>>>>> leaning >>>>>>>>>>>>>>>>>>>>>>>>>> towards >>>>>>>>>>>>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think >>>>>>>>> the >>>>>>>>>>>>> backwards >>>>>>>>>>>>>>>>>>>>>>>>>> compatibility >>>>>>>>>>>>>>>>>>>>>>>>>>>> requirements would make it unworkable (very >>>>>>>>> quickly over >>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> combinations we would have to test would just >>>>>>>>> be >>>>>>>>>>>>>> unreasonable) >>>>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>>>>>>>>> don’t >>>>>>>>>>>>>>>>>>>>>>>>>>>> feel confident we can have things as stable as >>>>>>>>> we need >>>>>>>>>> to >>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>> deliver >>>>>>>>>>>>>>>>>>>>>>>>>>>> the version separation/independency I want to >>>>>>>>> delivery >>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>> AIP-72. >>>>>>>>>>>>>>>>>>>>>>>>>>>> So unless someone feels very strongly about >>>>>>>>> this, I will >>>>>>>>>>>>>> come up >>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>>>>>> draft PR for further discussion that will >>>>>>>>> implement code >>>>>>>>>>>>>> sharing >>>>>>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea >>>>>>>>> of how I >>>>>>>>>> can >>>>>>>>>>>>>>>>>> achieve >>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll >>>>>>>>> work >>>>>>>>>> there, >>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>> runtime >>>>>>>>>>>>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it >>>>>>>>> lives at >>>>>>>>>>>>> something >>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we >>>>>>>>> would end >>>>>>>>>> up >>>>>>>>>>>>>> with: >>>>>>>>>>>>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml >>>>>>>>>>>>>>>>>>>>>>>>>>>> airflow-core/src/ >>>>>>>>>>>>>>>>>>>>>>>>>>>> airflow-core/tests/ >>>>>>>>>>>>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml >>>>>>>>>>>>>>>>>>>>>>>>>>>> task-sdk/src/ >>>>>>>>>>>>>>>>>>>>>>>>>>>> task-sdk/tests/ >>>>>>>>>>>>>>>>>>>>>>>>>>>> airflow-common/src >>>>>>>>>>>>>>>>>>>>>>>>>>>> airflow-common/tests/ >>>>>>>>>>>>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, >>>>>>>>> as deps >>>>>>>>>> would >>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>> included >>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>> the downstream projects. TBD. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. >>>>>>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe, e-mail: >>>>>>>>>> dev-unsubscr...@airflow.apache.org >>>>>>>>>>>>>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>>>>>>>>> dev-h...@airflow.apache.org >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>>> To unsubscribe, e-mail: >>>>>>>>> dev-unsubscr...@airflow.apache.org >>>>>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>>>>> dev-h...@airflow.apache.org >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>>>>> dev-h...@airflow.apache.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>>>>> dev-h...@airflow.apache.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>