In case I did a bad job explaining it, the “core and task sdk” is not in the 
module name/import name, just in the file path. 

Anyone have other ideas?

> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> wrote:
> 
> Thanks Ash! Looks cool! I like the structure. This will enable all the
> combinations and structure looks easy to grasp. No strong stance on the
> naming other than maybe it is a bit long with `and`, `core_ctl` could be
> shorter, since no import path is defined like that, we can give any name
> for sure.
> 
> Best regards,
> 
>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote:
>> 
>> Looks good but I think we should find some better logical name for
>> core_and_sdk :)
>> 
>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
>> <j_scheff...@gmx.de.invalid> napisał:
>> 
>>> Cool! Especially the "shared" folder with the ability to have
>>> N-combinations w/o exploding project repo root!
>>> 
>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote:
>>>> Oh, and all of this will be explain in shared/README.md
>>>> 
>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> wrote:
>>>>> 
>>>>> Okay, so it seems we have agreement on the approach here, so I’ll
>>> continue with this, and on the dev call it was mentioned that
>>> “airflow-common” wasn’t a great name, so here is my proposal for the file
>>> structure;
>>>>> 
>>>>> ```
>>>>> /
>>>>>  task-sdk/...
>>>>>  airflow-core/...
>>>>>  shared/
>>>>>    kuberenetes/
>>>>>      pyproject.toml
>>>>>      src/
>>>>>        airflow_kube/__init__.py
>>>>>    core-and-tasksdk/
>>>>>      pyproject.toml
>>>>>      src/
>>>>>        airflow_shared/__init__.py
>>>>> ```
>>>>> 
>>>>> Things to note here: the “shared” folder has (the possibility) of
>>> having multiple different shared “libraries” in it, in this example I am
>>> supposing a hypothetical shared kuberenetes folder a world in which we
>>> split the KubePodOperator and the KubeExecutor in to two separate
>>> distributions (example only, not proposing we do that right now, and that
>>> will be a separate discussion)
>>>>> 
>>>>> The other things to note here:
>>>>> 
>>>>> 
>>>>> - the folder name in shared aims to be “self-documenting”, hence the
>>> verbose “core-and-tasksdk” to say where the shared library is intended to
>>> be used.
>>>>> - the python module itself should almost always have an `airflow_` (or
>>> maybe `_airflow_`?) prefix so that it does not conflict with anything
>> else
>>> we might use. It won’t matter “in production” as those will be vendored
>> in
>>> to be imported as `airflow/_vendor/airflow_shared` etc, but avoiding
>>> conflicts at dev time with the Finder approach is a good safety measure.
>>>>> 
>>>>> I will start making a real PR for this proposal now, but I’m open to
>>> feedback (either here, or in the PR when I open it)
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>> 
>>>>>> Yeah we have to try it and test - also building packages happens semi
>>>>>> frequently when you run `uv sync` (they use some kind of heuristics
>> to
>>>>>> decide when) and you can force it with `--reinstall` or `--refresh`.
>>>>>> Package build also happens every time when you run "ci-image build`
>>> now in
>>>>>> breeze so it seems like it will nicely integrate in our workflows.
>>>>>> 
>>>>>> Looks really cool Ash.
>>>>>> 
>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> It’s not just release time, but any time we build a package which
>>> happens
>>>>>>> on “every” CI run. The normal unit tests will use code from
>>>>>>> airflow-common/src/airflow_common; the kube tests which build an
>>> image will
>>>>>>> build the dists and vendor in the code from that commit.
>>>>>>> 
>>>>>>> There is only a single copy of the shared code committed to the
>> repo,
>>> so
>>>>>>> there is never anything to synchronise.
>>>>>>> 
>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com>
>>> wrote:
>>>>>>>> 
>>>>>>>> Thanks Ash.
>>>>>>>> 
>>>>>>>> This is really cool and helpful that you were able to test both
>>> scenarios
>>>>>>>> -- repo checkout
>>>>>>>> and also installing from the vendored package and the resolution
>>> worked
>>>>>>>> fine too.
>>>>>>>> 
>>>>>>>> I like this idea compared the to relative import one for few
>> reasons:
>>>>>>>> - It feels like it will take some time to adjust to the new coding
>>>>>>> standard
>>>>>>>> that we will lay
>>>>>>>> if we impose relative imports in the shared dist
>>>>>>>> - We can continue using repo wise absolute import standards, it is
>>> also
>>>>>>>> much easier for situations
>>>>>>>> when we do global search in IDE to find + replace, this could mean
>> a
>>>>>>> change
>>>>>>>> there
>>>>>>>> - The vendoring work is a proven and established paradigm across
>>> projects
>>>>>>>> and would
>>>>>>>> out of box give us the build tooling we need also
>>>>>>>> 
>>>>>>>> Nothing too against the relative import but with the evidence
>>> provided
>>>>>>>> above, vendored approach
>>>>>>>> seems to only do us good.
>>>>>>>> 
>>>>>>>> Regarding synchronizing it, release time should be fine as long as
>> we
>>>>>>> have
>>>>>>>> a good CI workflow to probably
>>>>>>>> catch such issues per PR if changes are made in shared dist?
>>> (precommit
>>>>>>>> would make it really slow i guess)
>>>>>>>> 
>>>>>>>> If we can run our tests with vendored code we should be mostly
>>> covered.
>>>>>>>> 
>>>>>>>> Good effort all!
>>>>>>>> 
>>>>>>>> Thanks & Regards,
>>>>>>>> Amogh Desai
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org>
>>>>>>> wrote:
>>>>>>>>> Okay, I think I’ve got something that works and I’m happy with.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
>>>>>>>>> This produces the following from `uv build task-sdk`
>>>>>>>>> -
>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
>>>>>>>>> -
>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> ❯ unzip -l
>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
>>>>>>> grep
>>>>>>>>> _vendor
>>>>>>>>>     50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
>>>>>>>>>   2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
>>>>>>>>>     28  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common.pyi
>>>>>>>>>     18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
>>>>>>>>>    785  02-02-2020 00:00
>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
>>>>>>>>>  10628  02-02-2020 00:00
>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete too:
>>>>>>>>> ```
>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor
>>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
>>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
>>>>>>>>> 
>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
>>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>> 
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
>>>>>>>>> 
>>>>>>> 
>>> 
>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> The plugin works at build time by including/copying the libs
>>> specified
>>>>>>> in
>>>>>>>>> vendor.txt into place (and let `vendoring` take care of import
>>>>>>> rewrites.)
>>>>>>>>> For the imports to continue to work at “dev” time/from a repo
>>> checkout,
>>>>>>> I
>>>>>>>>> have added a import finder to `sys.meta_path`, and since it’s at
>> the
>>>>>>> end of
>>>>>>>>> the list it will only be used if the normal import can’t find
>>> things.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
>>>>>>>>> This doesn’t quite give us the same runtime effect “import
>>> rewriting”
>>>>>>>>> affect, as in this approach `airflow_common` is directly loaded
>>> (i.e.
>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist in
>>>>>>>>> sys.modules), but it does work for everything that I was able to
>>> test..
>>>>>>>>> 
>>>>>>>>> I tested it with the diff at the end of this message. My test
>>> ipython
>>>>>>>>> shell:
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone import
>> foo
>>>>>>>>> 
>>>>>>>>> In [2]: foo
>>>>>>>>> Out[2]: 1
>>>>>>>>> 
>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common
>>>>>>>>> 
>>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone
>>>>>>>>> 
>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
>>>>>>>>> Out[5]:
>>>>>>>>> 
>>>>>>> 
>>> 
>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
>>>>>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>>>>>> Out[6]:
>>>>>>>>> 
>>>>>>> 
>>> 
>> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And in an standalone environment with the SDK dist I built (it
>>> needed
>>>>>>> the
>>>>>>>>> matching airflow-core right now, but that is nothing to do with
>> this
>>>>>>>>> discussion):
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
>>>>>>>>> Type 'copyright', 'credits' or 'license' for more information
>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for
>> help.
>>>>>>>>> Tip: You can use `%hist` to view history, see the options with
>>>>>>> `%history?`
>>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone
>>>>>>>>> 
>>>>>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
>>>>>>>>> Out[2]:
>>>>>>>>> 
>>>>>>> 
>>> 
>> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
>>>>>>>>> ``
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ```diff
>>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py
>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py
>>>>>>>>> index 13a83393a9..927b7c6b61 100644
>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py
>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
>>>>>>>>> @@ -14,3 +14,5 @@
>>>>>>>>> # KIND, either express or implied.  See the License for the
>>>>>>>>> # specific language governing permissions and limitations
>>>>>>>>> # under the License.
>>>>>>>>> +
>>>>>>>>> +foo = 1
>>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py
>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py
>>>>>>>>> index 340b924c66..58384ef20f 100644
>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py
>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3
>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
>>>>>>>>> utc = pendulum.UTC
>>>>>>>>> 
>>>>>>>>> +
>>>>>>>>> +from airflow_common import foo
>>>>>>>>> +
>>>>>>>>> TIMEZONE: Timezone
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>>>>> I think both approaches are doable:
>>>>>>>>>> 
>>>>>>>>>> 1) -> We can very easily prevent bad imports by pre-commit when
>>>>>>> importing
>>>>>>>>>> from different distributions and make sure we are only doing
>>> relative
>>>>>>>>>> imports in the shared modules. We are doing plenty of this
>>> already. And
>>>>>>>>> yes
>>>>>>>>>> it would require relative links we currently do not allow.
>>>>>>>>>> 
>>>>>>>>>> 2) -> has one disadvantage that someone at some point in time
>> will
>>> have
>>>>>>>>> to
>>>>>>>>>> decide to synchronize this and if it happens just before release
>>> (I bet
>>>>>>>>>> this is going to happen) this will lead to solving problems that
>>> would
>>>>>>>>>> normally be solved during PR when you make a change (i.e.
>> symbolic
>>> link
>>>>>>>>> has
>>>>>>>>>> the advantage that whoever modifies shared code will be
>> immediately
>>>>>>>>>> notified in their PR - that they broke something because either
>>> static
>>>>>>>>>> checks or mypy or tests fail.
>>>>>>>>>> 
>>>>>>>>>> Ash, do you have an idea of a process (who and when) does the
>>>>>>>>>> synchronisation in case of vendoring? Maybe we could solve it if
>>> it is
>>>>>>>>> done
>>>>>>>>>> more frequently and with some regularity? We could potentially
>>> force
>>>>>>>>>> re-vendoring at PR time as well any time shared code changes (and
>>>>>>> prevent
>>>>>>>>>> it by pre-commit. And I can't think of some place (other than
>>> releases)
>>>>>>>>> in
>>>>>>>>>> our development workflow and that seems to be a bit too late as
>>> puts an
>>>>>>>>>> extra effort on fixing potential incompatibilities introduced on
>>>>>>> release
>>>>>>>>>> manager and delays the release. WDYT?
>>>>>>>>>> 
>>>>>>>>>> Re: relative links. I think for a shared library we could
>>> potentially
>>>>>>>>> relax
>>>>>>>>>> this and allow them (and actually disallow absolute links in the
>>> pieces
>>>>>>>>> of
>>>>>>>>>> code that are shared - again, by pre-commit). As I recall, the
>> only
>>>>>>>>> reason
>>>>>>>>>> we forbade the relative links is because of how we are (or maybe
>>> were)
>>>>>>>>>> doing DAG parsing and failures resulting from it. So we decided
>> to
>>> just
>>>>>>>>> not
>>>>>>>>>> allow it to keep consistency. The way how Dag parsing works is
>> that
>>>>>>> when
>>>>>>>>>> you are using importlib to read the Dag from a file, the relative
>>>>>>> imports
>>>>>>>>>> do not work as it does not know what they should be relative to.
>>> But if
>>>>>>>>>> relative import is done from an imported package, it should be no
>>>>>>>>> problem,
>>>>>>>>>> I think - otherwise our Dags would not be able to import any
>>> library
>>>>>>> that
>>>>>>>>>> uses relative imports.
>>>>>>>>>> 
>>>>>>>>>> Of course consistency might be the reason why we do not want to
>>>>>>> introduce
>>>>>>>>>> relative imports. I don't see it as an issue if it is guarded by
>>>>>>>>> pre-commit
>>>>>>>>>> though.
>>>>>>>>>> 
>>>>>>>>>> J.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> J.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
>>> a...@apache.org>
>>>>>>>>>> napisał:
>>>>>>>>>> 
>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It does mean
>> you
>>>>>>> can’t
>>>>>>>>>>> use absolute imports in one common module to another.
>>>>>>>>>>> 
>>>>>>>>>>> For example
>>>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
>>>>>>>>>>> where we have
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> from airflow.utils.module_loading import import_string
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> if we want to move serve_logs into this common lib that is then
>>>>>>>>> symlinked
>>>>>>>>>>> then we wouldn’t be able to have `from
>>> airflow_common.module_loading
>>>>>>>>> import
>>>>>>>>>>> import_string`.
>>>>>>>>>>> 
>>>>>>>>>>> I can think of two possible solutions here.
>>>>>>>>>>> 
>>>>>>>>>>> 1) is to allow/require relative imports in this shared lib, so
>>> `from
>>>>>>>>>>> .module_loading import import_string`
>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which
>> will
>>>>>>> handle
>>>>>>>>>>> import-rewriting for us.
>>>>>>>>>>> 
>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so I
>>> prepared
>>>>>>> a
>>>>>>>>>>> minimal POC/demo of what vendoring approach could look like here
>>>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
>>>>>>>>>>> Now personally I am more than happy with relative imports, but
>>>>>>> generally
>>>>>>>>>>> as a project we have avoided them, so I think that limits what
>> we
>>>>>>> could
>>>>>>>>> do
>>>>>>>>>>> with a symlink based approach.
>>>>>>>>>>> 
>>>>>>>>>>> -ash
>>>>>>>>>>> 
>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring
>>>>>>>>>>> 
>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
>>>>>>> gopidesupa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Thanks Ash
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making sure we
>>> have all
>>>>>>>>> the
>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code sharing
>> and
>>>>>>>>>>> executing
>>>>>>>>>>>> right of tests between the packages.
>>>>>>>>>>>> 
>>>>>>>>>>>> Agree with others, option 2 would be
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
>>>>>>> amoghdesai....@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for starting this discussion, Ash.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to handle the
>>> code
>>>>>>>>>>>>> duplication at *release* time.
>>>>>>>>>>>>> It is best to have a dist that has all it needs in itself.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we decide
>> to
>>>>>>>>> separate
>>>>>>>>>>>>> triggerer /
>>>>>>>>>>>>> dag processor / config etc etc as separate packages, back
>>> compat is
>>>>>>>>>>> going
>>>>>>>>>>>>> to be a nightmare
>>>>>>>>>>>>> and will bite us harder than we anticipate.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>>>> Amogh Desai
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <
>> kaxiln...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
>>>>>>>>> <j_scheff...@gmx.de.invalid
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is
>> rather
>>>>>>>>>>> pragmatic
>>>>>>>>>>>>>>> and we no not need to cut another package and have less
>>> package
>>>>>>>>> counts
>>>>>>>>>>>>>>> and dependencies.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I remember some time ago I was checking (together with
>> Jarek,
>>> I am
>>>>>>>>> not
>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be possible.
>>> To
>>>>>>> keep
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> source in one package but "symlink" it into another. If then
>>> at
>>>>>>>>> point
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> packaging/release the files are materialized we have 1 set
>> of
>>>>>>> code.
>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy could be
>>> solved by
>>>>>>> a
>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are de-duplicated
>>> anyway
>>>>>>>>> based
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> content hash, so this does not hurt.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - the
>>> reasonings
>>>>>>>>>>>>> you've
>>>>>>>>>>>>>>>> shown for that make sense to me.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Shahar
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <
>>> a...@apache.org
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> As we work on finishing off the code-level separation of
>>> Task
>>>>>>> SDK
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> Core
>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some situations where
>> we
>>>>>>> would
>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> share code between these.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put it in a
>>> common
>>>>>>>>>>>>> dist
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the Task SDK
>>>>>>>>>>>>> separation
>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> to have 100% complete version independence between the
>> two,
>>>>>>>>> ideally
>>>>>>>>>>>>>>> even if
>>>>>>>>>>>>>>>>> they are built into the same image and venv. Most of the
>>> reason
>>>>>>>>> why
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards
>>> compatibility -
>>>>>>> if
>>>>>>>>> we
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>> an change to the common/shared distribution
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in
>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that
>>> covers
>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this discussion
>>> such as
>>>>>>>>>>>>>>> possibly
>>>>>>>>>>>>>>>>> separating operators and executors out of a single
>> provider
>>>>>>> dist.)
>>>>>>>>>>>>>>>>> To give a concrete example of some code I would like to
>>> share
>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>> 
>> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to share will
>>> be
>>>>>>> the
>>>>>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration (but
>>>>>>> notably:
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, again, lets
>>> not
>>>>>>>>> dwell
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> specifics of that)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So to bring the options listed in the issue here for
>>> discussion,
>>>>>>>>>>>>>> broadly
>>>>>>>>>>>>>>>>> speaking there are two high-level approaches:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. A single shared distribution
>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the code
>>> in one
>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>> However for me, at least in this specific case of Logging
>>> config
>>>>>>>>> or
>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards compatibility
>> is
>>>>>>> much
>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>> harder.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is
>> released
>>>>>>>>>>>>>>> with/embedded
>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain
>> the
>>>>>>> right
>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The downside
>> is
>>>>>>> that
>>>>>>>>>>>>>> either
>>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo, or better
>>> yet
>>>>>>> it
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> live in a single place in the repo, but some tooling (TBD)
>>> will
>>>>>>>>>>>>>>>>> automatically handle the duplication, either at commit
>>> time, or
>>>>>>> my
>>>>>>>>>>>>>>>>> preference, at release time.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very strongly
>>>>>>> leaning
>>>>>>>>>>>>>>> towards
>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the
>> backwards
>>>>>>>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>> requirements would make it unworkable (very quickly over
>>> time
>>>>>>> the
>>>>>>>>>>>>>>>>> combinations we would have to test would just be
>>> unreasonable)
>>>>>>>>> and I
>>>>>>>>>>>>>>> don’t
>>>>>>>>>>>>>>>>> feel confident we can have things as stable as we need to
>>> really
>>>>>>>>>>>>>> deliver
>>>>>>>>>>>>>>>>> the version separation/independency I want to delivery
>> with
>>>>>>>>> AIP-72.
>>>>>>>>>>>>>>>>> So unless someone feels very strongly about this, I will
>>> come up
>>>>>>>>>>>>> with
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> draft PR for further discussion that will implement code
>>> sharing
>>>>>>>>> via
>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I can
>>>>>>> achieve
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll work there,
>>> but
>>>>>>> at
>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at
>> something
>>>>>>> like
>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we would end up
>>> with:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml
>>>>>>>>>>>>>>>>> airflow-core/src/
>>>>>>>>>>>>>>>>> airflow-core/tests/
>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml
>>>>>>>>>>>>>>>>> task-sdk/src/
>>>>>>>>>>>>>>>>> task-sdk/tests/
>>>>>>>>>>>>>>>>> airflow-common/src
>>>>>>>>>>>>>>>>> airflow-common/tests/
>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would
>>> be
>>>>>>>>>>>>> included
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> the downstream projects. TBD.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed.
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-h...@airflow.apache.org
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to