Agreed!

Once the PR is up, we can have these implementation level discussions
over there. Good chat however!

Thanks & Regards,
Amogh Desai


On Wed, Jul 9, 2025 at 3:56 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Yeah. I think extracting one-by-one, feature-by-feature that we want to
> share to a separate distribution is the best approach - it will actually
> also help with the "__init__.py" cleanup - because almost by definition -
> those distributions will not be able to "reach" outside - i.e. they only
> can be "used" not "use" something else. Which means that - for example as
> it is now - configuration using logging which is using configuration
> (leading to circular dependencies and partially initialized modules) will
> not happen and we will have to figure out rather how to inject
> configuration into logging (from task-sdk, airflow-ctl, airflow-core) and
> to get the right sequence of initialization - rather than have the
> inter-feature-dependencies.
>
> And that is precisely what low-cyclomatic complexity is about as well - it
> generally leads to easier-to-maintain software that has well defined
> functionality and does not have those weird circular dependencies we have
> now. That's a kind-of side-effect of such a "per feature" split, but a very
> desirable one.
>
> J.
>
> On Wed, Jul 9, 2025 at 10:17 AM Amogh Desai <amoghdesai....@gmail.com>
> wrote:
>
>> Probably, you make a valid point.
>>
>> Maybe this is an implementation detail, so we could figure it out as we
>> start on a POC and factor in these things
>> as we move along?
>>
>> But from an initial guess, I would think that execution time related
>> items (if we manage to enumerate them) would be something
>> that would be better off in that "core_and_task_sdk" bundle.
>>
>> Thanks & Regards,
>> Amogh Desai
>>
>>
>> On Tue, Jul 8, 2025 at 3:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> > Not that I am against your idea and we can surely expand as we need
>>> but we would not need to expand the
>>> "core_and_task_sdk" if we put only the relevant items into it.
>>>
>>> So if we move logging and config out, my question is what is really
>>> relevant to "stay" in "core_and_task_sdk" ? And what we know will not be
>>> needed by other distributions in the future ?
>>>
>>> What would be the content of "core_and_task_sdk" ? Can we enumerate what
>>> should go there now ?
>>>
>>> My bet is that if we enumerate them, it will turn out that they are good
>>> candidates to make separate "shared features" that can be logically named
>>> and modularised, and that there is no real need to have shared
>>> "core_and_task_sdk" which sounds like "bag of everything else".
>>>
>>>
>>>
>>> On Tue, Jul 8, 2025 at 11:48 AM Amogh Desai <amoghdesai....@gmail.com>
>>> wrote:
>>>
>>>> Yeah, I think what you are showcasing here is a step ahead of the
>>>> initial proposal from Ash.
>>>>
>>>> From the original proposal, the `core_and_task_sdk` *can* have the
>>>> things relevant to just those two
>>>> distros. Logging, Config are modules that might be needed by
>>>> airflow-ctl for example, so ideally, those
>>>> would not be good candidates to be put in there, ideally speaking.
>>>>
>>>> The example of Kubernetes Utils sounds to be a good example, it will be
>>>> used by KubeExecutor (lets say this is a
>>>> module called "executors") and by KPO (providers), and the
>>>> "shared/kubernetes" would probably be a good
>>>> candidate for that.
>>>>
>>>> Not that I am against your idea and we can surely expand as we need but
>>>> we would not need to expand the
>>>> "core_and_task_sdk" if we put only the relevant items into it.
>>>>
>>>> Thanks & Regards,
>>>> Amogh Desai
>>>>
>>>>
>>>> On Tue, Jul 8, 2025 at 12:28 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> > @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
>>>>> there, I am understanding the direction
>>>>> but could you elaborate a bit more please?
>>>>>
>>>>> Let me elaborate:
>>>>>
>>>>> As I understand (maybe I am wrong?),  the proposal is that we have a
>>>>> "core-and-task-sdk" folder which is a shared distribution that is
>>>>> vendored-in into both "airflow-core" and "airflow-task-sdk". This
>>>>> contains some shared code that we want to include in both distributions
>>>>> (note that we never ever release "core-and-task-sdk"
>>>>> distribution because it only contains code that is shared between the two
>>>>> distributions we release.
>>>>>
>>>>> That's fine and cool.
>>>>>
>>>>> Imagine that this distribution contains "logging" (shared code
>>>>> for logging) and "config" (shared code for configuration). - both needed 
>>>>> in
>>>>> "airflow-core" and "airflow-task-sdk". So far so good. But what happen if
>>>>> we want to use logging in the same fashion in say "airflow-ctl" (that also
>>>>> applies for other distributions that we might come up with) ? Are we going
>>>>> to vendor in the whole "core-and-task-sdk" distribution in "airflow-ctl" ?
>>>>> It would be far better if we just vendor in "logging" and do not vendor-in
>>>>> "config".
>>>>>
>>>>> And if we are going to have a mechanism to vendor-in "a distribution"
>>>>> - there is nothing wrong with having the same way to vendor-in multiple
>>>>> distributions - so we can easily do it this way (i added "orm" ,
>>>>> "serialization". and "fast_api" as an example thing that we might want to
>>>>> share - not sure if that is really something we want to do but it will
>>>>> allow to illustrate my idea better)
>>>>>
>>>>> /
>>>>>   airflow-ctl/
>>>>>   task-sdk/...
>>>>>   airflow-core/...
>>>>>   ....
>>>>>   shared/
>>>>>     kubernetes/
>>>>>       pyproject.toml
>>>>>       src/
>>>>>         airflow_shared_kubernetes/__init__.py
>>>>>     logging/
>>>>>       pyproject.toml
>>>>>       src/
>>>>>         airflow_shared_logging/__init__.py
>>>>>     config/
>>>>>       pyproject.toml
>>>>>       src/
>>>>>         airflow_shared_config/__init__.py
>>>>>     orm/
>>>>>       pyproject.toml
>>>>>       src/
>>>>>         airflow_shared_orm/__init__.py
>>>>>      serialization/
>>>>>       pyproject.toml
>>>>>       src/
>>>>>         airflow_shared_serialization/__init__.py
>>>>>      fast_api/
>>>>>       pyproject.toml
>>>>>       src/
>>>>>         airflow_shared_fast_api/__init__.py
>>>>>     ...
>>>>>
>>>>> This has multiple benefits (and I see no real drawbacks):
>>>>>
>>>>> * the code can be really well modularised.  Those "things" we share
>>>>> (and this also connects to the __init__.py discussion) - can be 
>>>>> independent
>>>>> - and (it follow Jens comment) it allows to keep low cyclomatic complexity
>>>>> https://en.wikipedia.org/wiki/Cyclomatic_complexity . It will be way
>>>>> easier to implement logging in the way that it does not import or use
>>>>> config. This means for example that configuration for logging will need to
>>>>> be injected when logging is initialized - and that's exactly what we want,
>>>>> we do not want logging code to use configuration code directly - they
>>>>> should be independent from each other and you should be free to vendor-in
>>>>> either logging or config independently if you also vendored-in the other.
>>>>>
>>>>> * it's much more logical. We split based on functionality we want to
>>>>> share - not about the "distributions" we want to produce. That allows us -
>>>>> in the future - to make different decisions on how we split our
>>>>> distributions. For example (i do not tell we have to do it, or that we 
>>>>> will
>>>>> do it but we will have such a possibility) - we can add more shared
>>>>> utilities we find useful in the same way and decide that "scheduler",
>>>>> "api_server" or "scheduler" or "triggerer" or "dag processor" are split to
>>>>> separate distributions - because for example we want to keep a number of
>>>>> dependencies down. And for example "api_server" might use "fast_api",
>>>>> "config", "logging", "orm" and "fast_api" and "serialization" , where the
>>>>> scheduler should not need "fast_api". The "dag_processor" eventually might
>>>>> not need "orm" nor "fast_api" and only use the other
>>>>>
>>>>> This seems like a natural approach. If we have a mechanism to "share"
>>>>> the code, it does not add complexity, but allows us to isolate independent
>>>>> functionality into "isolated" boxes and use them
>>>>>
>>>>> Also for cyclomatic complexity that is a complex word (badly chosen as
>>>>> it scares people away) and has some math behind, but it really boils down
>>>>> to very simple "rules of thumb" (and yes I am a big proponent of having 
>>>>> low
>>>>> cyclomatic complexity).
>>>>>
>>>>> a) when you are building the "final" product (i.e. distribution you
>>>>> want to release) - make sure that you only "use" things - that nothing 
>>>>> else
>>>>> is "using" you as a library.
>>>>> b) when you are building a "shared" thing (a library) - make sure that
>>>>> library is only "used" by others but it does not really "use" anything
>>>>> else. For example in the case I explained above - we can achieve
>>>>> low-cyclomatic complexity when:
>>>>>
>>>>> * airflow-core uses: logging, config, orm, serialization, fast_api
>>>>> * none of the "logging, config, orm, serialization, fast_api" use each
>>>>> other - they are at the bottom of the "user -> used" tree
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 8, 2025 at 8:16 AM Amogh Desai <amoghdesai....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I like the folder structure proposed by Ash and have no objections
>>>>>> with it.
>>>>>>
>>>>>> "core_and_task_sdk" sounds good to me and justifies what it should do
>>>>>> pretty well.
>>>>>>
>>>>>> @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean
>>>>>> there, I am understanding the direction
>>>>>> but could you elaborate a bit more please?
>>>>>>
>>>>>> Naming is REALLY hard!
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Amogh Desai
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>
>>>>>>> How about splitting it even more and having each shared "thing"
>>>>>>> named?
>>>>>>> "logging", "config" and sharing them explicitly and separately with
>>>>>>> the
>>>>>>> right "user" ?
>>>>>>> That sounds way more modular and  we will be able to choose which of
>>>>>>> the
>>>>>>> shared "utils" we use where.
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler
>>>>>>> <j_scheff...@gmx.de.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > I like "core_and_task_sdk the same like core-and-task-sdk - I have
>>>>>>> no
>>>>>>> > problem and it is a path only.
>>>>>>> >
>>>>>>> > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which
>>>>>>> is a
>>>>>>> > bit bulky we then should name it "all-not-api-server" :-D
>>>>>>> >
>>>>>>> > On 07.07.25 22:57, Ash Berlin-Taylor wrote:
>>>>>>> > > In case I did a bad job explaining it, the “core and task sdk”
>>>>>>> is not in
>>>>>>> > the module name/import name, just in the file path.
>>>>>>> > >
>>>>>>> > > Anyone have other ideas?
>>>>>>> > >
>>>>>>> > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com>
>>>>>>> wrote:
>>>>>>> > >>
>>>>>>> > >> Thanks Ash! Looks cool! I like the structure. This will enable
>>>>>>> all the
>>>>>>> > >> combinations and structure looks easy to grasp. No strong
>>>>>>> stance on the
>>>>>>> > >> naming other than maybe it is a bit long with `and`, `core_ctl`
>>>>>>> could be
>>>>>>> > >> shorter, since no import path is defined like that, we can give
>>>>>>> any name
>>>>>>> > >> for sure.
>>>>>>> > >>
>>>>>>> > >> Best regards,
>>>>>>> > >>
>>>>>>> > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>> > >>>
>>>>>>> > >>> Looks good but I think we should find some better logical name
>>>>>>> for
>>>>>>> > >>> core_and_sdk :)
>>>>>>> > >>>
>>>>>>> > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
>>>>>>> > >>> <j_scheff...@gmx.de.invalid> napisał:
>>>>>>> > >>>
>>>>>>> > >>>> Cool! Especially the "shared" folder with the ability to have
>>>>>>> > >>>> N-combinations w/o exploding project repo root!
>>>>>>> > >>>>
>>>>>>> > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote:
>>>>>>> > >>>>> Oh, and all of this will be explain in shared/README.md
>>>>>>> > >>>>>
>>>>>>> > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org>
>>>>>>> wrote:
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Okay, so it seems we have agreement on the approach here,
>>>>>>> so I’ll
>>>>>>> > >>>> continue with this, and on the dev call it was mentioned that
>>>>>>> > >>>> “airflow-common” wasn’t a great name, so here is my proposal
>>>>>>> for the
>>>>>>> > file
>>>>>>> > >>>> structure;
>>>>>>> > >>>>>> ```
>>>>>>> > >>>>>> /
>>>>>>> > >>>>>>   task-sdk/...
>>>>>>> > >>>>>>   airflow-core/...
>>>>>>> > >>>>>>   shared/
>>>>>>> > >>>>>>     kuberenetes/
>>>>>>> > >>>>>>       pyproject.toml
>>>>>>> > >>>>>>       src/
>>>>>>> > >>>>>>         airflow_kube/__init__.py
>>>>>>> > >>>>>>     core-and-tasksdk/
>>>>>>> > >>>>>>       pyproject.toml
>>>>>>> > >>>>>>       src/
>>>>>>> > >>>>>>         airflow_shared/__init__.py
>>>>>>> > >>>>>> ```
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Things to note here: the “shared” folder has (the
>>>>>>> possibility) of
>>>>>>> > >>>> having multiple different shared “libraries” in it, in this
>>>>>>> example I
>>>>>>> > am
>>>>>>> > >>>> supposing a hypothetical shared kuberenetes folder a world in
>>>>>>> which we
>>>>>>> > >>>> split the KubePodOperator and the KubeExecutor in to two
>>>>>>> separate
>>>>>>> > >>>> distributions (example only, not proposing we do that right
>>>>>>> now, and
>>>>>>> > that
>>>>>>> > >>>> will be a separate discussion)
>>>>>>> > >>>>>> The other things to note here:
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> - the folder name in shared aims to be “self-documenting”,
>>>>>>> hence the
>>>>>>> > >>>> verbose “core-and-tasksdk” to say where the shared library is
>>>>>>> > intended to
>>>>>>> > >>>> be used.
>>>>>>> > >>>>>> - the python module itself should almost always have an
>>>>>>> `airflow_`
>>>>>>> > (or
>>>>>>> > >>>> maybe `_airflow_`?) prefix so that it does not conflict with
>>>>>>> anything
>>>>>>> > >>> else
>>>>>>> > >>>> we might use. It won’t matter “in production” as those will be
>>>>>>> > vendored
>>>>>>> > >>> in
>>>>>>> > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but
>>>>>>> avoiding
>>>>>>> > >>>> conflicts at dev time with the Finder approach is a good
>>>>>>> safety
>>>>>>> > measure.
>>>>>>> > >>>>>> I will start making a real PR for this proposal now, but
>>>>>>> I’m open to
>>>>>>> > >>>> feedback (either here, or in the PR when I open it)
>>>>>>> > >>>>>> -ash
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>> Yeah we have to try it and test - also building packages
>>>>>>> happens
>>>>>>> > semi
>>>>>>> > >>>>>>> frequently when you run `uv sync` (they use some kind of
>>>>>>> heuristics
>>>>>>> > >>> to
>>>>>>> > >>>>>>> decide when) and you can force it with `--reinstall` or
>>>>>>> > `--refresh`.
>>>>>>> > >>>>>>> Package build also happens every time when you run
>>>>>>> "ci-image build`
>>>>>>> > >>>> now in
>>>>>>> > >>>>>>> breeze so it seems like it will nicely integrate in our
>>>>>>> workflows.
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>> Looks really cool Ash.
>>>>>>> > >>>>>>>
>>>>>>> > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <
>>>>>>> a...@apache.org>
>>>>>>> > >>>> wrote:
>>>>>>> > >>>>>>>> It’s not just release time, but any time we build a
>>>>>>> package which
>>>>>>> > >>>> happens
>>>>>>> > >>>>>>>> on “every” CI run. The normal unit tests will use code
>>>>>>> from
>>>>>>> > >>>>>>>> airflow-common/src/airflow_common; the kube tests which
>>>>>>> build an
>>>>>>> > >>>> image will
>>>>>>> > >>>>>>>> build the dists and vendor in the code from that commit.
>>>>>>> > >>>>>>>>
>>>>>>> > >>>>>>>> There is only a single copy of the shared code committed
>>>>>>> to the
>>>>>>> > >>> repo,
>>>>>>> > >>>> so
>>>>>>> > >>>>>>>> there is never anything to synchronise.
>>>>>>> > >>>>>>>>
>>>>>>> > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <
>>>>>>> amoghdesai....@gmail.com>
>>>>>>> > >>>> wrote:
>>>>>>> > >>>>>>>>> Thanks Ash.
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> This is really cool and helpful that you were able to
>>>>>>> test both
>>>>>>> > >>>> scenarios
>>>>>>> > >>>>>>>>> -- repo checkout
>>>>>>> > >>>>>>>>> and also installing from the vendored package and the
>>>>>>> resolution
>>>>>>> > >>>> worked
>>>>>>> > >>>>>>>>> fine too.
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> I like this idea compared the to relative import one for
>>>>>>> few
>>>>>>> > >>> reasons:
>>>>>>> > >>>>>>>>> - It feels like it will take some time to adjust to the
>>>>>>> new
>>>>>>> > coding
>>>>>>> > >>>>>>>> standard
>>>>>>> > >>>>>>>>> that we will lay
>>>>>>> > >>>>>>>>> if we impose relative imports in the shared dist
>>>>>>> > >>>>>>>>> - We can continue using repo wise absolute import
>>>>>>> standards, it
>>>>>>> > is
>>>>>>> > >>>> also
>>>>>>> > >>>>>>>>> much easier for situations
>>>>>>> > >>>>>>>>> when we do global search in IDE to find + replace, this
>>>>>>> could
>>>>>>> > mean
>>>>>>> > >>> a
>>>>>>> > >>>>>>>> change
>>>>>>> > >>>>>>>>> there
>>>>>>> > >>>>>>>>> - The vendoring work is a proven and established
>>>>>>> paradigm across
>>>>>>> > >>>> projects
>>>>>>> > >>>>>>>>> and would
>>>>>>> > >>>>>>>>> out of box give us the build tooling we need also
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> Nothing too against the relative import but with the
>>>>>>> evidence
>>>>>>> > >>>> provided
>>>>>>> > >>>>>>>>> above, vendored approach
>>>>>>> > >>>>>>>>> seems to only do us good.
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> Regarding synchronizing it, release time should be fine
>>>>>>> as long
>>>>>>> > as
>>>>>>> > >>> we
>>>>>>> > >>>>>>>> have
>>>>>>> > >>>>>>>>> a good CI workflow to probably
>>>>>>> > >>>>>>>>> catch such issues per PR if changes are made in shared
>>>>>>> dist?
>>>>>>> > >>>> (precommit
>>>>>>> > >>>>>>>>> would make it really slow i guess)
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> If we can run our tests with vendored code we should be
>>>>>>> mostly
>>>>>>> > >>>> covered.
>>>>>>> > >>>>>>>>> Good effort all!
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>> Thanks & Regards,
>>>>>>> > >>>>>>>>> Amogh Desai
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>>
>>>>>>> > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <
>>>>>>> > a...@apache.org>
>>>>>>> > >>>>>>>> wrote:
>>>>>>> > >>>>>>>>>> Okay, I think I’ve got something that works and I’m
>>>>>>> happy with.
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
>>>>>>> > >>>>>>>>>> This produces the following from `uv build task-sdk`
>>>>>>> > >>>>>>>>>> -
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
>>>>>>> > >>>>>>>>>> -
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
>>>>>>> > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will
>>>>>>> .zip)
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>> ❯ unzip -l
>>>>>>> > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
>>>>>>> > >>>>>>>> grep
>>>>>>> > >>>>>>>>>> _vendor
>>>>>>> > >>>>>>>>>>      50  02-02-2020 00:00
>>>>>>>  airflow/sdk/_vendor/.gitignore
>>>>>>> > >>>>>>>>>>    2082  02-02-2020 00:00
>>>>>>>  airflow/sdk/_vendor/__init__.py
>>>>>>> > >>>>>>>>>>      28  02-02-2020 00:00
>>>>>>> >  airflow/sdk/_vendor/airflow_common.pyi
>>>>>>> > >>>>>>>>>>      18  02-02-2020 00:00
>>>>>>>  airflow/sdk/_vendor/vendor.txt
>>>>>>> > >>>>>>>>>>     785  02-02-2020 00:00
>>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
>>>>>>> > >>>>>>>>>>   10628  02-02-2020 00:00
>>>>>>> > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is
>>>>>>> complete too:
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz
>>>>>>> |grep
>>>>>>> > _vendor
>>>>>>> > >>>>>>>>>>
>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
>>>>>>> > >>>>>>>>>>
>>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>
>>>>>>> >
>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
>>>>>>> > >>>>>>>>>>
>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
>>>>>>> > >>>
>>>>>>> >
>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> The plugin works at build time by including/copying the
>>>>>>> libs
>>>>>>> > >>>> specified
>>>>>>> > >>>>>>>> in
>>>>>>> > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of
>>>>>>> import
>>>>>>> > >>>>>>>> rewrites.)
>>>>>>> > >>>>>>>>>> For the imports to continue to work at “dev” time/from
>>>>>>> a repo
>>>>>>> > >>>> checkout,
>>>>>>> > >>>>>>>> I
>>>>>>> > >>>>>>>>>> have added a import finder to `sys.meta_path`, and
>>>>>>> since it’s at
>>>>>>> > >>> the
>>>>>>> > >>>>>>>> end of
>>>>>>> > >>>>>>>>>> the list it will only be used if the normal import
>>>>>>> can’t find
>>>>>>> > >>>> things.
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
>>>>>>> > >>>>>>>>>> This doesn’t quite give us the same runtime effect
>>>>>>> “import
>>>>>>> > >>>> rewriting”
>>>>>>> > >>>>>>>>>> affect, as in this approach `airflow_common` is
>>>>>>> directly loaded
>>>>>>> > >>>> (i.e.
>>>>>>> > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common
>>>>>>> exist in
>>>>>>> > >>>>>>>>>> sys.modules), but it does work for everything that I
>>>>>>> was able to
>>>>>>> > >>>> test..
>>>>>>> > >>>>>>>>>> I tested it with the diff at the end of this message.
>>>>>>> My test
>>>>>>> > >>>> ipython
>>>>>>> > >>>>>>>>>> shell:
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>> In [1]: from
>>>>>>> airflow.sdk._vendor.airflow_common.timezone import
>>>>>>> > >>> foo
>>>>>>> > >>>>>>>>>> In [2]: foo
>>>>>>> > >>>>>>>>>> Out[2]: 1
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> In [4]: import
>>>>>>> airflow.sdk._vendor.airflow_common.timezone
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
>>>>>>> > >>>>>>>>>> Out[5]:
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
>>>>>>> > >>>>>>>>>> In [6]:
>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>>>> > >>>>>>>>>> Out[6]:
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> And in an standalone environment with the SDK dist I
>>>>>>> built (it
>>>>>>> > >>>> needed
>>>>>>> > >>>>>>>> the
>>>>>>> > >>>>>>>>>> matching airflow-core right now, but that is nothing to
>>>>>>> do with
>>>>>>> > >>> this
>>>>>>> > >>>>>>>>>> discussion):
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
>>>>>>> > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
>>>>>>> > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl
>>>>>>> ipython
>>>>>>> > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang
>>>>>>> 18.1.8 ]
>>>>>>> > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more
>>>>>>> information
>>>>>>> > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type
>>>>>>> '?' for
>>>>>>> > >>> help.
>>>>>>> > >>>>>>>>>> Tip: You can use `%hist` to view history, see the
>>>>>>> options with
>>>>>>> > >>>>>>>> `%history?`
>>>>>>> > >>>>>>>>>> In [1]: import
>>>>>>> airflow.sdk._vendor.airflow_common.timezone
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> In [2]:
>>>>>>> airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>>>> > >>>>>>>>>> Out[2]:
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
>>>>>>> > >>>>>>>>>> ``
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> ```diff
>>>>>>> > >>>>>>>>>> diff --git
>>>>>>> a/airflow-common/src/airflow_common/__init__.py
>>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py
>>>>>>> > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644
>>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py
>>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
>>>>>>> > >>>>>>>>>> @@ -14,3 +14,5 @@
>>>>>>> > >>>>>>>>>> # KIND, either express or implied.  See the License for
>>>>>>> the
>>>>>>> > >>>>>>>>>> # specific language governing permissions and
>>>>>>> limitations
>>>>>>> > >>>>>>>>>> # under the License.
>>>>>>> > >>>>>>>>>> +
>>>>>>> > >>>>>>>>>> +foo = 1
>>>>>>> > >>>>>>>>>> diff --git
>>>>>>> a/airflow-common/src/airflow_common/timezone.py
>>>>>>> > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py
>>>>>>> > >>>>>>>>>> index 340b924c66..58384ef20f 100644
>>>>>>> > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py
>>>>>>> > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
>>>>>>> > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
>>>>>>> > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3
>>>>>>> > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
>>>>>>> > >>>>>>>>>> utc = pendulum.UTC
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> +
>>>>>>> > >>>>>>>>>> +from airflow_common import foo
>>>>>>> > >>>>>>>>>> +
>>>>>>> > >>>>>>>>>> TIMEZONE: Timezone
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>> ```
>>>>>>> > >>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <
>>>>>>> ja...@potiuk.com>
>>>>>>> > wrote:
>>>>>>> > >>>>>>>>>>> I think both approaches are doable:
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by
>>>>>>> pre-commit when
>>>>>>> > >>>>>>>> importing
>>>>>>> > >>>>>>>>>>> from different distributions and make sure we are only
>>>>>>> doing
>>>>>>> > >>>> relative
>>>>>>> > >>>>>>>>>>> imports in the shared modules. We are doing plenty of
>>>>>>> this
>>>>>>> > >>>> already. And
>>>>>>> > >>>>>>>>>> yes
>>>>>>> > >>>>>>>>>>> it would require relative links we currently do not
>>>>>>> allow.
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point
>>>>>>> in time
>>>>>>> > >>> will
>>>>>>> > >>>> have
>>>>>>> > >>>>>>>>>> to
>>>>>>> > >>>>>>>>>>> decide to synchronize this and if it happens just
>>>>>>> before
>>>>>>> > release
>>>>>>> > >>>> (I bet
>>>>>>> > >>>>>>>>>>> this is going to happen) this will lead to solving
>>>>>>> problems
>>>>>>> > that
>>>>>>> > >>>> would
>>>>>>> > >>>>>>>>>>> normally be solved during PR when you make a change
>>>>>>> (i.e.
>>>>>>> > >>> symbolic
>>>>>>> > >>>> link
>>>>>>> > >>>>>>>>>> has
>>>>>>> > >>>>>>>>>>> the advantage that whoever modifies shared code will be
>>>>>>> > >>> immediately
>>>>>>> > >>>>>>>>>>> notified in their PR - that they broke something
>>>>>>> because either
>>>>>>> > >>>> static
>>>>>>> > >>>>>>>>>>> checks or mypy or tests fail.
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> Ash, do you have an idea of a process (who and when)
>>>>>>> does the
>>>>>>> > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could
>>>>>>> solve it
>>>>>>> > if
>>>>>>> > >>>> it is
>>>>>>> > >>>>>>>>>> done
>>>>>>> > >>>>>>>>>>> more frequently and with some regularity? We could
>>>>>>> potentially
>>>>>>> > >>>> force
>>>>>>> > >>>>>>>>>>> re-vendoring at PR time as well any time shared code
>>>>>>> changes
>>>>>>> > (and
>>>>>>> > >>>>>>>> prevent
>>>>>>> > >>>>>>>>>>> it by pre-commit. And I can't think of some place
>>>>>>> (other than
>>>>>>> > >>>> releases)
>>>>>>> > >>>>>>>>>> in
>>>>>>> > >>>>>>>>>>> our development workflow and that seems to be a bit
>>>>>>> too late as
>>>>>>> > >>>> puts an
>>>>>>> > >>>>>>>>>>> extra effort on fixing potential incompatibilities
>>>>>>> introduced
>>>>>>> > on
>>>>>>> > >>>>>>>> release
>>>>>>> > >>>>>>>>>>> manager and delays the release. WDYT?
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> Re: relative links. I think for a shared library we
>>>>>>> could
>>>>>>> > >>>> potentially
>>>>>>> > >>>>>>>>>> relax
>>>>>>> > >>>>>>>>>>> this and allow them (and actually disallow absolute
>>>>>>> links in
>>>>>>> > the
>>>>>>> > >>>> pieces
>>>>>>> > >>>>>>>>>> of
>>>>>>> > >>>>>>>>>>> code that are shared - again, by pre-commit). As I
>>>>>>> recall, the
>>>>>>> > >>> only
>>>>>>> > >>>>>>>>>> reason
>>>>>>> > >>>>>>>>>>> we forbade the relative links is because of how we are
>>>>>>> (or
>>>>>>> > maybe
>>>>>>> > >>>> were)
>>>>>>> > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So
>>>>>>> we decided
>>>>>>> > >>> to
>>>>>>> > >>>> just
>>>>>>> > >>>>>>>>>> not
>>>>>>> > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing
>>>>>>> works is
>>>>>>> > >>> that
>>>>>>> > >>>>>>>> when
>>>>>>> > >>>>>>>>>>> you are using importlib to read the Dag from a file,
>>>>>>> the
>>>>>>> > relative
>>>>>>> > >>>>>>>> imports
>>>>>>> > >>>>>>>>>>> do not work as it does not know what they should be
>>>>>>> relative
>>>>>>> > to.
>>>>>>> > >>>> But if
>>>>>>> > >>>>>>>>>>> relative import is done from an imported package, it
>>>>>>> should be
>>>>>>> > no
>>>>>>> > >>>>>>>>>> problem,
>>>>>>> > >>>>>>>>>>> I think - otherwise our Dags would not be able to
>>>>>>> import any
>>>>>>> > >>>> library
>>>>>>> > >>>>>>>> that
>>>>>>> > >>>>>>>>>>> uses relative imports.
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> Of course consistency might be the reason why we do
>>>>>>> not want to
>>>>>>> > >>>>>>>> introduce
>>>>>>> > >>>>>>>>>>> relative imports. I don't see it as an issue if it is
>>>>>>> guarded
>>>>>>> > by
>>>>>>> > >>>>>>>>>> pre-commit
>>>>>>> > >>>>>>>>>>> though.
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> J.
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> J.
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
>>>>>>> > >>>> a...@apache.org>
>>>>>>> > >>>>>>>>>>> napisał:
>>>>>>> > >>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It
>>>>>>> does mean
>>>>>>> > >>> you
>>>>>>> > >>>>>>>> can’t
>>>>>>> > >>>>>>>>>>>> use absolute imports in one common module to another.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> For example
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>>>>>>> > >>>>>>>>>>>> where we have
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> ```
>>>>>>> > >>>>>>>>>>>> from airflow.utils.module_loading import import_string
>>>>>>> > >>>>>>>>>>>> ```
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> if we want to move serve_logs into this common lib
>>>>>>> that is
>>>>>>> > then
>>>>>>> > >>>>>>>>>> symlinked
>>>>>>> > >>>>>>>>>>>> then we wouldn’t be able to have `from
>>>>>>> > >>>> airflow_common.module_loading
>>>>>>> > >>>>>>>>>> import
>>>>>>> > >>>>>>>>>>>> import_string`.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> I can think of two possible solutions here.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> 1) is to allow/require relative imports in this
>>>>>>> shared lib, so
>>>>>>> > >>>> `from
>>>>>>> > >>>>>>>>>>>> .module_loading import import_string`
>>>>>>> > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip
>>>>>>> maintainers) which
>>>>>>> > >>> will
>>>>>>> > >>>>>>>> handle
>>>>>>> > >>>>>>>>>>>> import-rewriting for us.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a
>>>>>>> thing, so I
>>>>>>> > >>>> prepared
>>>>>>> > >>>>>>>> a
>>>>>>> > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could
>>>>>>> look like
>>>>>>> > here
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>>>>>>> > >>>>>>>>>>>> Now personally I am more than happy with relative
>>>>>>> imports, but
>>>>>>> > >>>>>>>> generally
>>>>>>> > >>>>>>>>>>>> as a project we have avoided them, so I think that
>>>>>>> limits what
>>>>>>> > >>> we
>>>>>>> > >>>>>>>> could
>>>>>>> > >>>>>>>>>> do
>>>>>>> > >>>>>>>>>>>> with a symlink based approach.
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> -ash
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring
>>>>>>> > >>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
>>>>>>> > >>>>>>>> gopidesupa...@gmail.com>
>>>>>>> > >>>>>>>>>>>> wrote:
>>>>>>> > >>>>>>>>>>>>> Thanks Ash
>>>>>>> > >>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making
>>>>>>> sure we
>>>>>>> > >>>> have all
>>>>>>> > >>>>>>>>>> the
>>>>>>> > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code
>>>>>>> sharing
>>>>>>> > >>> and
>>>>>>> > >>>>>>>>>>>> executing
>>>>>>> > >>>>>>>>>>>>> right of tests between the packages.
>>>>>>> > >>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>> Agree with others, option 2 would be
>>>>>>> > >>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
>>>>>>> > >>>>>>>> amoghdesai....@gmail.com>
>>>>>>> > >>>>>>>>>>>>> wrote:
>>>>>>> > >>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash.
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to
>>>>>>> handle
>>>>>>> > the
>>>>>>> > >>>> code
>>>>>>> > >>>>>>>>>>>>>> duplication at *release* time.
>>>>>>> > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in
>>>>>>> itself.
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if
>>>>>>> we decide
>>>>>>> > >>> to
>>>>>>> > >>>>>>>>>> separate
>>>>>>> > >>>>>>>>>>>>>> triggerer /
>>>>>>> > >>>>>>>>>>>>>> dag processor / config etc etc as separate
>>>>>>> packages, back
>>>>>>> > >>>> compat is
>>>>>>> > >>>>>>>>>>>> going
>>>>>>> > >>>>>>>>>>>>>> to be a nightmare
>>>>>>> > >>>>>>>>>>>>>> and will bite us harder than we anticipate.
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> Thanks & Regards,
>>>>>>> > >>>>>>>>>>>>>> Amogh Desai
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <
>>>>>>> > >>> kaxiln...@gmail.com>
>>>>>>> > >>>>>>>>>> wrote:
>>>>>>> > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of
>>>>>>> dependencies
>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
>>>>>>> > >>>>>>>>>> <j_scheff...@gmx.de.invalid
>>>>>>> > >>>>>>>>>>>>>>> wrote:
>>>>>>> > >>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is
>>>>>>> it is
>>>>>>> > >>> rather
>>>>>>> > >>>>>>>>>>>> pragmatic
>>>>>>> > >>>>>>>>>>>>>>>> and we no not need to cut another package and
>>>>>>> have less
>>>>>>> > >>>> package
>>>>>>> > >>>>>>>>>> counts
>>>>>>> > >>>>>>>>>>>>>>>> and dependencies.
>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together
>>>>>>> with
>>>>>>> > >>> Jarek,
>>>>>>> > >>>> I am
>>>>>>> > >>>>>>>>>> not
>>>>>>> > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be
>>>>>>> > possible.
>>>>>>> > >>>> To
>>>>>>> > >>>>>>>> keep
>>>>>>> > >>>>>>>>>>>>>> the
>>>>>>> > >>>>>>>>>>>>>>>> source in one package but "symlink" it into
>>>>>>> another. If
>>>>>>> > then
>>>>>>> > >>>> at
>>>>>>> > >>>>>>>>>> point
>>>>>>> > >>>>>>>>>>>>>> of
>>>>>>> > >>>>>>>>>>>>>>>> packaging/release the files are materialized we
>>>>>>> have 1 set
>>>>>>> > >>> of
>>>>>>> > >>>>>>>> code.
>>>>>>> > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy
>>>>>>> could be
>>>>>>> > >>>> solved by
>>>>>>> > >>>>>>>> a
>>>>>>> > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are
>>>>>>> de-duplicated
>>>>>>> > >>>> anyway
>>>>>>> > >>>>>>>>>> based
>>>>>>> > >>>>>>>>>>>>>> on
>>>>>>> > >>>>>>>>>>>>>>>> content hash, so this does not hurt.
>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>>>> > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI -
>>>>>>> the
>>>>>>> > >>>> reasonings
>>>>>>> > >>>>>>>>>>>>>> you've
>>>>>>> > >>>>>>>>>>>>>>>>> shown for that make sense to me.
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>> Shahar
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor
>>>>>>> <
>>>>>>> > >>>> a...@apache.org
>>>>>>> > >>>>>>>>>>>>>>> wrote:
>>>>>>> > >>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level
>>>>>>> separation of
>>>>>>> > >>>> Task
>>>>>>> > >>>>>>>> SDK
>>>>>>> > >>>>>>>>>>>>>> and
>>>>>>> > >>>>>>>>>>>>>>>> Core
>>>>>>> > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some
>>>>>>> situations
>>>>>>> > where
>>>>>>> > >>> we
>>>>>>> > >>>>>>>> would
>>>>>>> > >>>>>>>>>>>>>>> like
>>>>>>> > >>>>>>>>>>>>>>>> to
>>>>>>> > >>>>>>>>>>>>>>>>>> share code between these.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just
>>>>>>> put it in
>>>>>>> > a
>>>>>>> > >>>> common
>>>>>>> > >>>>>>>>>>>>>> dist
>>>>>>> > >>>>>>>>>>>>>>>> they
>>>>>>> > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of
>>>>>>> the Task
>>>>>>> > SDK
>>>>>>> > >>>>>>>>>>>>>> separation
>>>>>>> > >>>>>>>>>>>>>>>> was
>>>>>>> > >>>>>>>>>>>>>>>>>> to have 100% complete version independence
>>>>>>> between the
>>>>>>> > >>> two,
>>>>>>> > >>>>>>>>>> ideally
>>>>>>> > >>>>>>>>>>>>>>>> even if
>>>>>>> > >>>>>>>>>>>>>>>>>> they are built into the same image and venv.
>>>>>>> Most of the
>>>>>>> > >>>> reason
>>>>>>> > >>>>>>>>>> why
>>>>>>> > >>>>>>>>>>>>>>> this
>>>>>>> > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards
>>>>>>> > >>>> compatibility -
>>>>>>> > >>>>>>>> if
>>>>>>> > >>>>>>>>>> we
>>>>>>> > >>>>>>>>>>>>>>>> make
>>>>>>> > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about
>>>>>>> in
>>>>>>> > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545
>>>>>>> (but
>>>>>>> > that
>>>>>>> > >>>> covers
>>>>>>> > >>>>>>>>>>>>>> some
>>>>>>> > >>>>>>>>>>>>>>>> more
>>>>>>> > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this
>>>>>>> discussion
>>>>>>> > >>>> such as
>>>>>>> > >>>>>>>>>>>>>>>> possibly
>>>>>>> > >>>>>>>>>>>>>>>>>> separating operators and executors out of a
>>>>>>> single
>>>>>>> > >>> provider
>>>>>>> > >>>>>>>> dist.)
>>>>>>> > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would
>>>>>>> like to
>>>>>>> > >>>> share
>>>>>>> > >>>
>>>>>>> >
>>>>>>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>>>> > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to
>>>>>>> share
>>>>>>> > will
>>>>>>> > >>>> be
>>>>>>> > >>>>>>>> the
>>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from
>>>>>>> airflow.configuration
>>>>>>> > (but
>>>>>>> > >>>>>>>> notably:
>>>>>>> > >>>>>>>>>>>>>>> only
>>>>>>> > >>>>>>>>>>>>>>>> the
>>>>>>> > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values,
>>>>>>> again,
>>>>>>> > lets
>>>>>>> > >>>> not
>>>>>>> > >>>>>>>>>> dwell
>>>>>>> > >>>>>>>>>>>>>>> on
>>>>>>> > >>>>>>>>>>>>>>>> the
>>>>>>> > >>>>>>>>>>>>>>>>>> specifics of that)
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue
>>>>>>> here for
>>>>>>> > >>>> discussion,
>>>>>>> > >>>>>>>>>>>>>>> broadly
>>>>>>> > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches:
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> 1. A single shared distribution
>>>>>>> > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only
>>>>>>> have the
>>>>>>> > code
>>>>>>> > >>>> in one
>>>>>>> > >>>>>>>>>>>>>>> place.
>>>>>>> > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case
>>>>>>> of
>>>>>>> > Logging
>>>>>>> > >>>> config
>>>>>>> > >>>>>>>>>> or
>>>>>>> > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards
>>>>>>> > compatibility
>>>>>>> > >>> is
>>>>>>> > >>>>>>>> much
>>>>>>> > >>>>>>>>>>>>>> much
>>>>>>> > >>>>>>>>>>>>>>>>>> harder.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the
>>>>>>> code is
>>>>>>> > >>> released
>>>>>>> > >>>>>>>>>>>>>>>> with/embedded
>>>>>>> > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would
>>>>>>> contain
>>>>>>> > >>> the
>>>>>>> > >>>>>>>> right
>>>>>>> > >>>>>>>>>>>>>>>> version
>>>>>>> > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The
>>>>>>> > downside
>>>>>>> > >>> is
>>>>>>> > >>>>>>>> that
>>>>>>> > >>>>>>>>>>>>>>> either
>>>>>>> > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the
>>>>>>> repo, or
>>>>>>> > better
>>>>>>> > >>>> yet
>>>>>>> > >>>>>>>> it
>>>>>>> > >>>>>>>>>>>>>>> would
>>>>>>> > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some
>>>>>>> tooling
>>>>>>> > (TBD)
>>>>>>> > >>>> will
>>>>>>> > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at
>>>>>>> commit
>>>>>>> > >>>> time, or
>>>>>>> > >>>>>>>> my
>>>>>>> > >>>>>>>>>>>>>>>>>> preference, at release time.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very
>>>>>>> > strongly
>>>>>>> > >>>>>>>> leaning
>>>>>>> > >>>>>>>>>>>>>>>> towards
>>>>>>> > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think
>>>>>>> the
>>>>>>> > >>> backwards
>>>>>>> > >>>>>>>>>>>>>>>> compatibility
>>>>>>> > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very
>>>>>>> quickly over
>>>>>>> > >>>> time
>>>>>>> > >>>>>>>> the
>>>>>>> > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be
>>>>>>> > >>>> unreasonable)
>>>>>>> > >>>>>>>>>> and I
>>>>>>> > >>>>>>>>>>>>>>>> don’t
>>>>>>> > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as
>>>>>>> we need
>>>>>>> > to
>>>>>>> > >>>> really
>>>>>>> > >>>>>>>>>>>>>>> deliver
>>>>>>> > >>>>>>>>>>>>>>>>>> the version separation/independency I want to
>>>>>>> delivery
>>>>>>> > >>> with
>>>>>>> > >>>>>>>>>> AIP-72.
>>>>>>> > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about
>>>>>>> this, I will
>>>>>>> > >>>> come up
>>>>>>> > >>>>>>>>>>>>>> with
>>>>>>> > >>>>>>>>>>>>>>> a
>>>>>>> > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will
>>>>>>> implement code
>>>>>>> > >>>> sharing
>>>>>>> > >>>>>>>>>> via
>>>>>>> > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of
>>>>>>> how I
>>>>>>> > can
>>>>>>> > >>>>>>>> achieve
>>>>>>> > >>>>>>>>>>>>>> this
>>>>>>> > >>>>>>>>>>>>>>>> so
>>>>>>> > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll
>>>>>>> work
>>>>>>> > there,
>>>>>>> > >>>> but
>>>>>>> > >>>>>>>> at
>>>>>>> > >>>>>>>>>>>>>>>> runtime
>>>>>>> > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives
>>>>>>> at
>>>>>>> > >>> something
>>>>>>> > >>>>>>>> like
>>>>>>> > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we
>>>>>>> would end
>>>>>>> > up
>>>>>>> > >>>> with:
>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml
>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/src/
>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-core/tests/
>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml
>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/src/
>>>>>>> > >>>>>>>>>>>>>>>>>> task-sdk/tests/
>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/src
>>>>>>> > >>>>>>>>>>>>>>>>>> airflow-common/tests/
>>>>>>> > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as
>>>>>>> deps
>>>>>>> > would
>>>>>>> > >>>> be
>>>>>>> > >>>>>>>>>>>>>> included
>>>>>>> > >>>>>>>>>>>>>>>> in
>>>>>>> > >>>>>>>>>>>>>>>>>> the downstream projects. TBD.
>>>>>>> > >>>>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed.
>>>>>>> > >>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>> > dev-unsubscr...@airflow.apache.org
>>>>>>> > >>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>> > >>> dev-h...@airflow.apache.org
>>>>>>> > >>>>>>>>>>>>>>>>
>>>>>>> > >>>>>>>>
>>>>>>> > >>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >>>>>>>> To unsubscribe, e-mail:
>>>>>>> dev-unsubscr...@airflow.apache.org
>>>>>>> > >>>>>>>> For additional commands, e-mail:
>>>>>>> dev-h...@airflow.apache.org
>>>>>>> > >>>>>>>>
>>>>>>> > >>>>>>>>
>>>>>>> > >>>>>>
>>>>>>> >
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> > >>>>>> For additional commands, e-mail:
>>>>>>> dev-h...@airflow.apache.org
>>>>>>> > >>>>>>
>>>>>>> > >>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>> > >>>>>
>>>>>>> > >>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>> > >>>>
>>>>>>> > >>>>
>>>>>>> > >
>>>>>>> > >
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> > > For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>> > >
>>>>>>> >
>>>>>>> >
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> > For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>

Reply via email to