I like the folder structure proposed by Ash and have no objections with it.
"core_and_task_sdk" sounds good to me and justifies what it should do pretty well. @Jarek Potiuk <ja...@potiuk.com> a little confused on what you mean there, I am understanding the direction but could you elaborate a bit more please? Naming is REALLY hard! Thanks & Regards, Amogh Desai On Tue, Jul 8, 2025 at 2:52 AM Jarek Potiuk <ja...@potiuk.com> wrote: > How about splitting it even more and having each shared "thing" named? > "logging", "config" and sharing them explicitly and separately with the > right "user" ? > That sounds way more modular and we will be able to choose which of the > shared "utils" we use where. > > J. > > > On Mon, Jul 7, 2025 at 11:13 PM Jens Scheffler <j_scheff...@gmx.de.invalid > > > wrote: > > > I like "core_and_task_sdk the same like core-and-task-sdk - I have no > > problem and it is a path only. > > > > if we get to "dag-parser-scheduler-task-sdk-and-triggerer" which is a > > bit bulky we then should name it "all-not-api-server" :-D > > > > On 07.07.25 22:57, Ash Berlin-Taylor wrote: > > > In case I did a bad job explaining it, the “core and task sdk” is not > in > > the module name/import name, just in the file path. > > > > > > Anyone have other ideas? > > > > > >> On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> > wrote: > > >> > > >> Thanks Ash! Looks cool! I like the structure. This will enable all > the > > >> combinations and structure looks easy to grasp. No strong stance on > the > > >> naming other than maybe it is a bit long with `and`, `core_ctl` could > be > > >> shorter, since no import path is defined like that, we can give any > name > > >> for sure. > > >> > > >> Best regards, > > >> > > >>> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote: > > >>> > > >>> Looks good but I think we should find some better logical name for > > >>> core_and_sdk :) > > >>> > > >>> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler > > >>> <j_scheff...@gmx.de.invalid> napisał: > > >>> > > >>>> Cool! Especially the "shared" folder with the ability to have > > >>>> N-combinations w/o exploding project repo root! > > >>>> > > >>>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: > > >>>>> Oh, and all of this will be explain in shared/README.md > > >>>>> > > >>>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> > wrote: > > >>>>>> > > >>>>>> Okay, so it seems we have agreement on the approach here, so I’ll > > >>>> continue with this, and on the dev call it was mentioned that > > >>>> “airflow-common” wasn’t a great name, so here is my proposal for the > > file > > >>>> structure; > > >>>>>> ``` > > >>>>>> / > > >>>>>> task-sdk/... > > >>>>>> airflow-core/... > > >>>>>> shared/ > > >>>>>> kuberenetes/ > > >>>>>> pyproject.toml > > >>>>>> src/ > > >>>>>> airflow_kube/__init__.py > > >>>>>> core-and-tasksdk/ > > >>>>>> pyproject.toml > > >>>>>> src/ > > >>>>>> airflow_shared/__init__.py > > >>>>>> ``` > > >>>>>> > > >>>>>> Things to note here: the “shared” folder has (the possibility) of > > >>>> having multiple different shared “libraries” in it, in this example > I > > am > > >>>> supposing a hypothetical shared kuberenetes folder a world in which > we > > >>>> split the KubePodOperator and the KubeExecutor in to two separate > > >>>> distributions (example only, not proposing we do that right now, and > > that > > >>>> will be a separate discussion) > > >>>>>> The other things to note here: > > >>>>>> > > >>>>>> > > >>>>>> - the folder name in shared aims to be “self-documenting”, hence > the > > >>>> verbose “core-and-tasksdk” to say where the shared library is > > intended to > > >>>> be used. > > >>>>>> - the python module itself should almost always have an `airflow_` > > (or > > >>>> maybe `_airflow_`?) prefix so that it does not conflict with > anything > > >>> else > > >>>> we might use. It won’t matter “in production” as those will be > > vendored > > >>> in > > >>>> to be imported as `airflow/_vendor/airflow_shared` etc, but avoiding > > >>>> conflicts at dev time with the Finder approach is a good safety > > measure. > > >>>>>> I will start making a real PR for this proposal now, but I’m open > to > > >>>> feedback (either here, or in the PR when I open it) > > >>>>>> -ash > > >>>>>> > > >>>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote: > > >>>>>>> > > >>>>>>> Yeah we have to try it and test - also building packages happens > > semi > > >>>>>>> frequently when you run `uv sync` (they use some kind of > heuristics > > >>> to > > >>>>>>> decide when) and you can force it with `--reinstall` or > > `--refresh`. > > >>>>>>> Package build also happens every time when you run "ci-image > build` > > >>>> now in > > >>>>>>> breeze so it seems like it will nicely integrate in our > workflows. > > >>>>>>> > > >>>>>>> Looks really cool Ash. > > >>>>>>> > > >>>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org > > > > >>>> wrote: > > >>>>>>>> It’s not just release time, but any time we build a package > which > > >>>> happens > > >>>>>>>> on “every” CI run. The normal unit tests will use code from > > >>>>>>>> airflow-common/src/airflow_common; the kube tests which build an > > >>>> image will > > >>>>>>>> build the dists and vendor in the code from that commit. > > >>>>>>>> > > >>>>>>>> There is only a single copy of the shared code committed to the > > >>> repo, > > >>>> so > > >>>>>>>> there is never anything to synchronise. > > >>>>>>>> > > >>>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com > > > > >>>> wrote: > > >>>>>>>>> Thanks Ash. > > >>>>>>>>> > > >>>>>>>>> This is really cool and helpful that you were able to test both > > >>>> scenarios > > >>>>>>>>> -- repo checkout > > >>>>>>>>> and also installing from the vendored package and the > resolution > > >>>> worked > > >>>>>>>>> fine too. > > >>>>>>>>> > > >>>>>>>>> I like this idea compared the to relative import one for few > > >>> reasons: > > >>>>>>>>> - It feels like it will take some time to adjust to the new > > coding > > >>>>>>>> standard > > >>>>>>>>> that we will lay > > >>>>>>>>> if we impose relative imports in the shared dist > > >>>>>>>>> - We can continue using repo wise absolute import standards, it > > is > > >>>> also > > >>>>>>>>> much easier for situations > > >>>>>>>>> when we do global search in IDE to find + replace, this could > > mean > > >>> a > > >>>>>>>> change > > >>>>>>>>> there > > >>>>>>>>> - The vendoring work is a proven and established paradigm > across > > >>>> projects > > >>>>>>>>> and would > > >>>>>>>>> out of box give us the build tooling we need also > > >>>>>>>>> > > >>>>>>>>> Nothing too against the relative import but with the evidence > > >>>> provided > > >>>>>>>>> above, vendored approach > > >>>>>>>>> seems to only do us good. > > >>>>>>>>> > > >>>>>>>>> Regarding synchronizing it, release time should be fine as long > > as > > >>> we > > >>>>>>>> have > > >>>>>>>>> a good CI workflow to probably > > >>>>>>>>> catch such issues per PR if changes are made in shared dist? > > >>>> (precommit > > >>>>>>>>> would make it really slow i guess) > > >>>>>>>>> > > >>>>>>>>> If we can run our tests with vendored code we should be mostly > > >>>> covered. > > >>>>>>>>> Good effort all! > > >>>>>>>>> > > >>>>>>>>> Thanks & Regards, > > >>>>>>>>> Amogh Desai > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor < > > a...@apache.org> > > >>>>>>>> wrote: > > >>>>>>>>>> Okay, I think I’ve got something that works and I’m happy > with. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>> > > > https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core > > >>>>>>>>>> This produces the following from `uv build task-sdk` > > >>>>>>>>>> - > > >>>>>>>>>> > > >>> > > > https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz > > >>>>>>>>>> - > > >>>>>>>>>> > > >>> > > > https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip > > >>>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip) > > >>>>>>>>>> > > >>>>>>>>>> ``` > > >>>>>>>>>> ❯ unzip -l > > >>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | > > >>>>>>>> grep > > >>>>>>>>>> _vendor > > >>>>>>>>>> 50 02-02-2020 00:00 airflow/sdk/_vendor/.gitignore > > >>>>>>>>>> 2082 02-02-2020 00:00 airflow/sdk/_vendor/__init__.py > > >>>>>>>>>> 28 02-02-2020 00:00 > > airflow/sdk/_vendor/airflow_common.pyi > > >>>>>>>>>> 18 02-02-2020 00:00 airflow/sdk/_vendor/vendor.txt > > >>>>>>>>>> 785 02-02-2020 00:00 > > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py > > >>>>>>>>>> 10628 02-02-2020 00:00 > > >>>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py > > >>>>>>>>>> ``` > > >>>>>>>>>> > > >>>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete too: > > >>>>>>>>>> ``` > > >>>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep > > _vendor > > >>>>>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore > > >>>>>>>>>> > > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py > > >>>>>>>>>> > > >>>> > > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi > > >>>>>>>>>> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt > > >>>>>>>>>> > > >>>>>>>>>> > > >>> > > > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py > > >>> > > > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py > > >>>>>>>>>> ``` > > >>>>>>>>>> > > >>>>>>>>>> The plugin works at build time by including/copying the libs > > >>>> specified > > >>>>>>>> in > > >>>>>>>>>> vendor.txt into place (and let `vendoring` take care of import > > >>>>>>>> rewrites.) > > >>>>>>>>>> For the imports to continue to work at “dev” time/from a repo > > >>>> checkout, > > >>>>>>>> I > > >>>>>>>>>> have added a import finder to `sys.meta_path`, and since it’s > at > > >>> the > > >>>>>>>> end of > > >>>>>>>>>> the list it will only be used if the normal import can’t find > > >>>> things. > > >>>>>>>>>> > > >>>>>>>>>> > > >>> > > > https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py > > >>>>>>>>>> This doesn’t quite give us the same runtime effect “import > > >>>> rewriting” > > >>>>>>>>>> affect, as in this approach `airflow_common` is directly > loaded > > >>>> (i.e. > > >>>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist in > > >>>>>>>>>> sys.modules), but it does work for everything that I was able > to > > >>>> test.. > > >>>>>>>>>> I tested it with the diff at the end of this message. My test > > >>>> ipython > > >>>>>>>>>> shell: > > >>>>>>>>>> > > >>>>>>>>>> ``` > > >>>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone > import > > >>> foo > > >>>>>>>>>> In [2]: foo > > >>>>>>>>>> Out[2]: 1 > > >>>>>>>>>> > > >>>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common > > >>>>>>>>>> > > >>>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone > > >>>>>>>>>> > > >>>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ > > >>>>>>>>>> Out[5]: > > >>>>>>>>>> > > >>> > > > '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' > > >>>>>>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__ > > >>>>>>>>>> Out[6]: > > >>>>>>>>>> > > >>> > > > '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' > > >>>>>>>>>> ``` > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> And in an standalone environment with the SDK dist I built (it > > >>>> needed > > >>>>>>>> the > > >>>>>>>>>> matching airflow-core right now, but that is nothing to do > with > > >>> this > > >>>>>>>>>> discussion): > > >>>>>>>>>> > > >>>>>>>>>> ``` > > >>>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with > > >>>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with > > >>>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython > > >>>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] > > >>>>>>>>>> Type 'copyright', 'credits' or 'license' for more information > > >>>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for > > >>> help. > > >>>>>>>>>> Tip: You can use `%hist` to view history, see the options with > > >>>>>>>> `%history?` > > >>>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone > > >>>>>>>>>> > > >>>>>>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__ > > >>>>>>>>>> Out[2]: > > >>>>>>>>>> > > >>> > > > '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ > > >>>>>>>>>> `` > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> ```diff > > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py > > >>>>>>>>>> b/airflow-common/src/airflow_common/__init__.py > > >>>>>>>>>> index 13a83393a9..927b7c6b61 100644 > > >>>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py > > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py > > >>>>>>>>>> @@ -14,3 +14,5 @@ > > >>>>>>>>>> # KIND, either express or implied. See the License for the > > >>>>>>>>>> # specific language governing permissions and limitations > > >>>>>>>>>> # under the License. > > >>>>>>>>>> + > > >>>>>>>>>> +foo = 1 > > >>>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py > > >>>>>>>>>> b/airflow-common/src/airflow_common/timezone.py > > >>>>>>>>>> index 340b924c66..58384ef20f 100644 > > >>>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py > > >>>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py > > >>>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = > > >>>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 > > >>>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 > > >>>>>>>>>> utc = pendulum.UTC > > >>>>>>>>>> > > >>>>>>>>>> + > > >>>>>>>>>> +from airflow_common import foo > > >>>>>>>>>> + > > >>>>>>>>>> TIMEZONE: Timezone > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> ``` > > >>>>>>>>>> > > >>>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> > > wrote: > > >>>>>>>>>>> I think both approaches are doable: > > >>>>>>>>>>> > > >>>>>>>>>>> 1) -> We can very easily prevent bad imports by pre-commit > when > > >>>>>>>> importing > > >>>>>>>>>>> from different distributions and make sure we are only doing > > >>>> relative > > >>>>>>>>>>> imports in the shared modules. We are doing plenty of this > > >>>> already. And > > >>>>>>>>>> yes > > >>>>>>>>>>> it would require relative links we currently do not allow. > > >>>>>>>>>>> > > >>>>>>>>>>> 2) -> has one disadvantage that someone at some point in time > > >>> will > > >>>> have > > >>>>>>>>>> to > > >>>>>>>>>>> decide to synchronize this and if it happens just before > > release > > >>>> (I bet > > >>>>>>>>>>> this is going to happen) this will lead to solving problems > > that > > >>>> would > > >>>>>>>>>>> normally be solved during PR when you make a change (i.e. > > >>> symbolic > > >>>> link > > >>>>>>>>>> has > > >>>>>>>>>>> the advantage that whoever modifies shared code will be > > >>> immediately > > >>>>>>>>>>> notified in their PR - that they broke something because > either > > >>>> static > > >>>>>>>>>>> checks or mypy or tests fail. > > >>>>>>>>>>> > > >>>>>>>>>>> Ash, do you have an idea of a process (who and when) does the > > >>>>>>>>>>> synchronisation in case of vendoring? Maybe we could solve it > > if > > >>>> it is > > >>>>>>>>>> done > > >>>>>>>>>>> more frequently and with some regularity? We could > potentially > > >>>> force > > >>>>>>>>>>> re-vendoring at PR time as well any time shared code changes > > (and > > >>>>>>>> prevent > > >>>>>>>>>>> it by pre-commit. And I can't think of some place (other than > > >>>> releases) > > >>>>>>>>>> in > > >>>>>>>>>>> our development workflow and that seems to be a bit too late > as > > >>>> puts an > > >>>>>>>>>>> extra effort on fixing potential incompatibilities introduced > > on > > >>>>>>>> release > > >>>>>>>>>>> manager and delays the release. WDYT? > > >>>>>>>>>>> > > >>>>>>>>>>> Re: relative links. I think for a shared library we could > > >>>> potentially > > >>>>>>>>>> relax > > >>>>>>>>>>> this and allow them (and actually disallow absolute links in > > the > > >>>> pieces > > >>>>>>>>>> of > > >>>>>>>>>>> code that are shared - again, by pre-commit). As I recall, > the > > >>> only > > >>>>>>>>>> reason > > >>>>>>>>>>> we forbade the relative links is because of how we are (or > > maybe > > >>>> were) > > >>>>>>>>>>> doing DAG parsing and failures resulting from it. So we > decided > > >>> to > > >>>> just > > >>>>>>>>>> not > > >>>>>>>>>>> allow it to keep consistency. The way how Dag parsing works > is > > >>> that > > >>>>>>>> when > > >>>>>>>>>>> you are using importlib to read the Dag from a file, the > > relative > > >>>>>>>> imports > > >>>>>>>>>>> do not work as it does not know what they should be relative > > to. > > >>>> But if > > >>>>>>>>>>> relative import is done from an imported package, it should > be > > no > > >>>>>>>>>> problem, > > >>>>>>>>>>> I think - otherwise our Dags would not be able to import any > > >>>> library > > >>>>>>>> that > > >>>>>>>>>>> uses relative imports. > > >>>>>>>>>>> > > >>>>>>>>>>> Of course consistency might be the reason why we do not want > to > > >>>>>>>> introduce > > >>>>>>>>>>> relative imports. I don't see it as an issue if it is guarded > > by > > >>>>>>>>>> pre-commit > > >>>>>>>>>>> though. > > >>>>>>>>>>> > > >>>>>>>>>>> J. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> J. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < > > >>>> a...@apache.org> > > >>>>>>>>>>> napisał: > > >>>>>>>>>>> > > >>>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It does > mean > > >>> you > > >>>>>>>> can’t > > >>>>>>>>>>>> use absolute imports in one common module to another. > > >>>>>>>>>>>> > > >>>>>>>>>>>> For example > > >>>>>>>>>>>> > > >>> > > > https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 > > >>>>>>>>>>>> where we have > > >>>>>>>>>>>> > > >>>>>>>>>>>> ``` > > >>>>>>>>>>>> from airflow.utils.module_loading import import_string > > >>>>>>>>>>>> ``` > > >>>>>>>>>>>> > > >>>>>>>>>>>> if we want to move serve_logs into this common lib that is > > then > > >>>>>>>>>> symlinked > > >>>>>>>>>>>> then we wouldn’t be able to have `from > > >>>> airflow_common.module_loading > > >>>>>>>>>> import > > >>>>>>>>>>>> import_string`. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I can think of two possible solutions here. > > >>>>>>>>>>>> > > >>>>>>>>>>>> 1) is to allow/require relative imports in this shared lib, > so > > >>>> `from > > >>>>>>>>>>>> .module_loading import import_string` > > >>>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which > > >>> will > > >>>>>>>> handle > > >>>>>>>>>>>> import-rewriting for us. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so I > > >>>> prepared > > >>>>>>>> a > > >>>>>>>>>>>> minimal POC/demo of what vendoring approach could look like > > here > > >>>>>>>>>>>> > > >>> > > > https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 > > >>>>>>>>>>>> Now personally I am more than happy with relative imports, > but > > >>>>>>>> generally > > >>>>>>>>>>>> as a project we have avoided them, so I think that limits > what > > >>> we > > >>>>>>>> could > > >>>>>>>>>> do > > >>>>>>>>>>>> with a symlink based approach. > > >>>>>>>>>>>> > > >>>>>>>>>>>> -ash > > >>>>>>>>>>>> > > >>>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring > > >>>>>>>>>>>> > > >>>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < > > >>>>>>>> gopidesupa...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>> Thanks Ash > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making sure > we > > >>>> have all > > >>>>>>>>>> the > > >>>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code > sharing > > >>> and > > >>>>>>>>>>>> executing > > >>>>>>>>>>>>> right of tests between the packages. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Agree with others, option 2 would be > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < > > >>>>>>>> amoghdesai....@gmail.com> > > >>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks for starting this discussion, Ash. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I would prefer option 2 here with proper tooling to handle > > the > > >>>> code > > >>>>>>>>>>>>>> duplication at *release* time. > > >>>>>>>>>>>>>> It is best to have a dist that has all it needs in itself. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we > decide > > >>> to > > >>>>>>>>>> separate > > >>>>>>>>>>>>>> triggerer / > > >>>>>>>>>>>>>> dag processor / config etc etc as separate packages, back > > >>>> compat is > > >>>>>>>>>>>> going > > >>>>>>>>>>>>>> to be a nightmare > > >>>>>>>>>>>>>> and will bite us harder than we anticipate. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks & Regards, > > >>>>>>>>>>>>>> Amogh Desai > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < > > >>> kaxiln...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler > > >>>>>>>>>> <j_scheff...@gmx.de.invalid > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is > > >>> rather > > >>>>>>>>>>>> pragmatic > > >>>>>>>>>>>>>>>> and we no not need to cut another package and have less > > >>>> package > > >>>>>>>>>> counts > > >>>>>>>>>>>>>>>> and dependencies. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I remember some time ago I was checking (together with > > >>> Jarek, > > >>>> I am > > >>>>>>>>>> not > > >>>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be > > possible. > > >>>> To > > >>>>>>>> keep > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> source in one package but "symlink" it into another. If > > then > > >>>> at > > >>>>>>>>>> point > > >>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>> packaging/release the files are materialized we have 1 > set > > >>> of > > >>>>>>>> code. > > >>>>>>>>>>>>>>>> Otherwise if not possible still the redundancy could be > > >>>> solved by > > >>>>>>>> a > > >>>>>>>>>>>>>>>> pre-commit hook - and in Git the files are de-duplicated > > >>>> anyway > > >>>>>>>>>> based > > >>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>> content hash, so this does not hurt. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: > > >>>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - the > > >>>> reasonings > > >>>>>>>>>>>>>> you've > > >>>>>>>>>>>>>>>>> shown for that make sense to me. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Shahar > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor < > > >>>> a...@apache.org > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>> Hello everyone, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> As we work on finishing off the code-level separation > of > > >>>> Task > > >>>>>>>> SDK > > >>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>> Core > > >>>>>>>>>>>>>>>>>> (scheduler etc) we have come across some situations > > where > > >>> we > > >>>>>>>> would > > >>>>>>>>>>>>>>> like > > >>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>> share code between these. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put it > in > > a > > >>>> common > > >>>>>>>>>>>>>> dist > > >>>>>>>>>>>>>>>> they > > >>>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the Task > > SDK > > >>>>>>>>>>>>>> separation > > >>>>>>>>>>>>>>>> was > > >>>>>>>>>>>>>>>>>> to have 100% complete version independence between the > > >>> two, > > >>>>>>>>>> ideally > > >>>>>>>>>>>>>>>> even if > > >>>>>>>>>>>>>>>>>> they are built into the same image and venv. Most of > the > > >>>> reason > > >>>>>>>>>> why > > >>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards > > >>>> compatibility - > > >>>>>>>> if > > >>>>>>>>>> we > > >>>>>>>>>>>>>>>> make > > >>>>>>>>>>>>>>>>>> an change to the common/shared distribution > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in > > >>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but > > that > > >>>> covers > > >>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>> more > > >>>>>>>>>>>>>>>>>> things that I don’t want to get in to in this > discussion > > >>>> such as > > >>>>>>>>>>>>>>>> possibly > > >>>>>>>>>>>>>>>>>> separating operators and executors out of a single > > >>> provider > > >>>>>>>> dist.) > > >>>>>>>>>>>>>>>>>> To give a concrete example of some code I would like > to > > >>>> share > > >>> > > > https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py > > >>>>>>>>>>>>>>>>>> — logging config. Another thing we will want to share > > will > > >>>> be > > >>>>>>>> the > > >>>>>>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration > > (but > > >>>>>>>> notably: > > >>>>>>>>>>>>>>> only > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> parser class, _not_ the default config values, again, > > lets > > >>>> not > > >>>>>>>>>> dwell > > >>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> specifics of that) > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> So to bring the options listed in the issue here for > > >>>> discussion, > > >>>>>>>>>>>>>>> broadly > > >>>>>>>>>>>>>>>>>> speaking there are two high-level approaches: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> 1. A single shared distribution > > >>>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the > > code > > >>>> in one > > >>>>>>>>>>>>>>> place. > > >>>>>>>>>>>>>>>>>> However for me, at least in this specific case of > > Logging > > >>>> config > > >>>>>>>>>> or > > >>>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards > > compatibility > > >>> is > > >>>>>>>> much > > >>>>>>>>>>>>>> much > > >>>>>>>>>>>>>>>>>> harder. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is > > >>> released > > >>>>>>>>>>>>>>>> with/embedded > > >>>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would > contain > > >>> the > > >>>>>>>> right > > >>>>>>>>>>>>>>>> version > > >>>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The > > downside > > >>> is > > >>>>>>>> that > > >>>>>>>>>>>>>>> either > > >>>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo, or > > better > > >>>> yet > > >>>>>>>> it > > >>>>>>>>>>>>>>> would > > >>>>>>>>>>>>>>>>>> live in a single place in the repo, but some tooling > > (TBD) > > >>>> will > > >>>>>>>>>>>>>>>>>> automatically handle the duplication, either at commit > > >>>> time, or > > >>>>>>>> my > > >>>>>>>>>>>>>>>>>> preference, at release time. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very > > strongly > > >>>>>>>> leaning > > >>>>>>>>>>>>>>>> towards > > >>>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the > > >>> backwards > > >>>>>>>>>>>>>>>> compatibility > > >>>>>>>>>>>>>>>>>> requirements would make it unworkable (very quickly > over > > >>>> time > > >>>>>>>> the > > >>>>>>>>>>>>>>>>>> combinations we would have to test would just be > > >>>> unreasonable) > > >>>>>>>>>> and I > > >>>>>>>>>>>>>>>> don’t > > >>>>>>>>>>>>>>>>>> feel confident we can have things as stable as we need > > to > > >>>> really > > >>>>>>>>>>>>>>> deliver > > >>>>>>>>>>>>>>>>>> the version separation/independency I want to delivery > > >>> with > > >>>>>>>>>> AIP-72. > > >>>>>>>>>>>>>>>>>> So unless someone feels very strongly about this, I > will > > >>>> come up > > >>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>> draft PR for further discussion that will implement > code > > >>>> sharing > > >>>>>>>>>> via > > >>>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I > > can > > >>>>>>>> achieve > > >>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>> so > > >>>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll work > > there, > > >>>> but > > >>>>>>>> at > > >>>>>>>>>>>>>>>> runtime > > >>>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at > > >>> something > > >>>>>>>> like > > >>>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we would > end > > up > > >>>> with: > > >>>>>>>>>>>>>>>>>> airflow-core/pyproject.toml > > >>>>>>>>>>>>>>>>>> airflow-core/src/ > > >>>>>>>>>>>>>>>>>> airflow-core/tests/ > > >>>>>>>>>>>>>>>>>> task-sdk/pyproject.toml > > >>>>>>>>>>>>>>>>>> task-sdk/src/ > > >>>>>>>>>>>>>>>>>> task-sdk/tests/ > > >>>>>>>>>>>>>>>>>> airflow-common/src > > >>>>>>>>>>>>>>>>>> airflow-common/tests/ > > >>>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps > > would > > >>>> be > > >>>>>>>>>>>>>> included > > >>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>> the downstream projects. TBD. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. > > >>>> > --------------------------------------------------------------------- > > >>>>>>>>>>>>>>>> To unsubscribe, e-mail: > > dev-unsubscr...@airflow.apache.org > > >>>>>>>>>>>>>>>> For additional commands, e-mail: > > >>> dev-h...@airflow.apache.org > > >>>>>>>>>>>>>>>> > > >>>>>>>> > > >>> --------------------------------------------------------------------- > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > --------------------------------------------------------------------- > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > > >>>>>> > > >>>>> > --------------------------------------------------------------------- > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > > >>>>> > > >>>> > --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org > > >>>> > > >>>> > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > >