In case I did a bad job explaining it, the “core and task sdk” is not in the module name/import name, just in the file path.
Anyone have other ideas? > On 7 Jul 2025, at 21:37, Buğra Öztürk <ozturkbugr...@gmail.com> wrote: > > Thanks Ash! Looks cool! I like the structure. This will enable all the > combinations and structure looks easy to grasp. No strong stance on the > naming other than maybe it is a bit long with `and`, `core_ctl` could be > shorter, since no import path is defined like that, we can give any name > for sure. > > Best regards, > >> On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote: >> >> Looks good but I think we should find some better logical name for >> core_and_sdk :) >> >> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler >> <j_scheff...@gmx.de.invalid> napisał: >> >>> Cool! Especially the "shared" folder with the ability to have >>> N-combinations w/o exploding project repo root! >>> >>> On 07.07.25 14:43, Ash Berlin-Taylor wrote: >>>> Oh, and all of this will be explain in shared/README.md >>>> >>>>> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> wrote: >>>>> >>>>> Okay, so it seems we have agreement on the approach here, so I’ll >>> continue with this, and on the dev call it was mentioned that >>> “airflow-common” wasn’t a great name, so here is my proposal for the file >>> structure; >>>>> >>>>> ``` >>>>> / >>>>> task-sdk/... >>>>> airflow-core/... >>>>> shared/ >>>>> kuberenetes/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_kube/__init__.py >>>>> core-and-tasksdk/ >>>>> pyproject.toml >>>>> src/ >>>>> airflow_shared/__init__.py >>>>> ``` >>>>> >>>>> Things to note here: the “shared” folder has (the possibility) of >>> having multiple different shared “libraries” in it, in this example I am >>> supposing a hypothetical shared kuberenetes folder a world in which we >>> split the KubePodOperator and the KubeExecutor in to two separate >>> distributions (example only, not proposing we do that right now, and that >>> will be a separate discussion) >>>>> >>>>> The other things to note here: >>>>> >>>>> >>>>> - the folder name in shared aims to be “self-documenting”, hence the >>> verbose “core-and-tasksdk” to say where the shared library is intended to >>> be used. >>>>> - the python module itself should almost always have an `airflow_` (or >>> maybe `_airflow_`?) prefix so that it does not conflict with anything >> else >>> we might use. It won’t matter “in production” as those will be vendored >> in >>> to be imported as `airflow/_vendor/airflow_shared` etc, but avoiding >>> conflicts at dev time with the Finder approach is a good safety measure. >>>>> >>>>> I will start making a real PR for this proposal now, but I’m open to >>> feedback (either here, or in the PR when I open it) >>>>> >>>>> -ash >>>>> >>>>>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>> >>>>>> Yeah we have to try it and test - also building packages happens semi >>>>>> frequently when you run `uv sync` (they use some kind of heuristics >> to >>>>>> decide when) and you can force it with `--reinstall` or `--refresh`. >>>>>> Package build also happens every time when you run "ci-image build` >>> now in >>>>>> breeze so it seems like it will nicely integrate in our workflows. >>>>>> >>>>>> Looks really cool Ash. >>>>>> >>>>>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org> >>> wrote: >>>>>> >>>>>>> It’s not just release time, but any time we build a package which >>> happens >>>>>>> on “every” CI run. The normal unit tests will use code from >>>>>>> airflow-common/src/airflow_common; the kube tests which build an >>> image will >>>>>>> build the dists and vendor in the code from that commit. >>>>>>> >>>>>>> There is only a single copy of the shared code committed to the >> repo, >>> so >>>>>>> there is never anything to synchronise. >>>>>>> >>>>>>>> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com> >>> wrote: >>>>>>>> >>>>>>>> Thanks Ash. >>>>>>>> >>>>>>>> This is really cool and helpful that you were able to test both >>> scenarios >>>>>>>> -- repo checkout >>>>>>>> and also installing from the vendored package and the resolution >>> worked >>>>>>>> fine too. >>>>>>>> >>>>>>>> I like this idea compared the to relative import one for few >> reasons: >>>>>>>> - It feels like it will take some time to adjust to the new coding >>>>>>> standard >>>>>>>> that we will lay >>>>>>>> if we impose relative imports in the shared dist >>>>>>>> - We can continue using repo wise absolute import standards, it is >>> also >>>>>>>> much easier for situations >>>>>>>> when we do global search in IDE to find + replace, this could mean >> a >>>>>>> change >>>>>>>> there >>>>>>>> - The vendoring work is a proven and established paradigm across >>> projects >>>>>>>> and would >>>>>>>> out of box give us the build tooling we need also >>>>>>>> >>>>>>>> Nothing too against the relative import but with the evidence >>> provided >>>>>>>> above, vendored approach >>>>>>>> seems to only do us good. >>>>>>>> >>>>>>>> Regarding synchronizing it, release time should be fine as long as >> we >>>>>>> have >>>>>>>> a good CI workflow to probably >>>>>>>> catch such issues per PR if changes are made in shared dist? >>> (precommit >>>>>>>> would make it really slow i guess) >>>>>>>> >>>>>>>> If we can run our tests with vendored code we should be mostly >>> covered. >>>>>>>> >>>>>>>> Good effort all! >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Amogh Desai >>>>>>>> >>>>>>>> >>>>>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org> >>>>>>> wrote: >>>>>>>>> Okay, I think I’ve got something that works and I’m happy with. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>> >> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core >>>>>>>>> This produces the following from `uv build task-sdk` >>>>>>>>> - >>>>>>>>> >>>>>>> >>> >> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz >>>>>>>>> - >>>>>>>>> >>>>>>> >>> >> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip >>>>>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip) >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> ❯ unzip -l >> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip | >>>>>>> grep >>>>>>>>> _vendor >>>>>>>>> 50 02-02-2020 00:00 airflow/sdk/_vendor/.gitignore >>>>>>>>> 2082 02-02-2020 00:00 airflow/sdk/_vendor/__init__.py >>>>>>>>> 28 02-02-2020 00:00 airflow/sdk/_vendor/airflow_common.pyi >>>>>>>>> 18 02-02-2020 00:00 airflow/sdk/_vendor/vendor.txt >>>>>>>>> 785 02-02-2020 00:00 >>>>>>>>> airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>>>> 10628 02-02-2020 00:00 >>>>>>>>> airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> And similarly in the .tar.gz, so our “sdist” is complete too: >>>>>>>>> ``` >>>>>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py >>>>>>>>> >>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi >>>>>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt >>>>>>>>> >>>>>>>>> >>>>>>> >>> >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py >>>>>>>>> >>>>>>> >>> >> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> The plugin works at build time by including/copying the libs >>> specified >>>>>>> in >>>>>>>>> vendor.txt into place (and let `vendoring` take care of import >>>>>>> rewrites.) >>>>>>>>> For the imports to continue to work at “dev” time/from a repo >>> checkout, >>>>>>> I >>>>>>>>> have added a import finder to `sys.meta_path`, and since it’s at >> the >>>>>>> end of >>>>>>>>> the list it will only be used if the normal import can’t find >>> things. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>> >> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py >>>>>>>>> This doesn’t quite give us the same runtime effect “import >>> rewriting” >>>>>>>>> affect, as in this approach `airflow_common` is directly loaded >>> (i.e. >>>>>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist in >>>>>>>>> sys.modules), but it does work for everything that I was able to >>> test.. >>>>>>>>> >>>>>>>>> I tested it with the diff at the end of this message. My test >>> ipython >>>>>>>>> shell: >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone import >> foo >>>>>>>>> >>>>>>>>> In [2]: foo >>>>>>>>> Out[2]: 1 >>>>>>>>> >>>>>>>>> In [3]: import airflow.sdk._vendor.airflow_common >>>>>>>>> >>>>>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone >>>>>>>>> >>>>>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__ >>>>>>>>> Out[5]: >>>>>>>>> >>>>>>> >>> >> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py' >>>>>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>>>> Out[6]: >>>>>>>>> >>>>>>> >>> >> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py' >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> >>>>>>>>> And in an standalone environment with the SDK dist I built (it >>> needed >>>>>>> the >>>>>>>>> matching airflow-core right now, but that is nothing to do with >> this >>>>>>>>> discussion): >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with >>>>>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with >>>>>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython >>>>>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] >>>>>>>>> Type 'copyright', 'credits' or 'license' for more information >>>>>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for >> help. >>>>>>>>> Tip: You can use `%hist` to view history, see the options with >>>>>>> `%history?` >>>>>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone >>>>>>>>> >>>>>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__ >>>>>>>>> Out[2]: >>>>>>>>> >>>>>>> >>> >> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’ >>>>>>>>> `` >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ```diff >>>>>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py >>>>>>>>> b/airflow-common/src/airflow_common/__init__.py >>>>>>>>> index 13a83393a9..927b7c6b61 100644 >>>>>>>>> --- a/airflow-common/src/airflow_common/__init__.py >>>>>>>>> +++ b/airflow-common/src/airflow_common/__init__.py >>>>>>>>> @@ -14,3 +14,5 @@ >>>>>>>>> # KIND, either express or implied. See the License for the >>>>>>>>> # specific language governing permissions and limitations >>>>>>>>> # under the License. >>>>>>>>> + >>>>>>>>> +foo = 1 >>>>>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py >>>>>>>>> b/airflow-common/src/airflow_common/timezone.py >>>>>>>>> index 340b924c66..58384ef20f 100644 >>>>>>>>> --- a/airflow-common/src/airflow_common/timezone.py >>>>>>>>> +++ b/airflow-common/src/airflow_common/timezone.py >>>>>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 = >>>>>>>>> version.parse(metadata.version("pendulum")).major == 3 >>>>>>>>> # - FixedTimezone(0, "UTC") in pendulum 2 >>>>>>>>> utc = pendulum.UTC >>>>>>>>> >>>>>>>>> + >>>>>>>>> +from airflow_common import foo >>>>>>>>> + >>>>>>>>> TIMEZONE: Timezone >>>>>>>>> >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>>>>>> I think both approaches are doable: >>>>>>>>>> >>>>>>>>>> 1) -> We can very easily prevent bad imports by pre-commit when >>>>>>> importing >>>>>>>>>> from different distributions and make sure we are only doing >>> relative >>>>>>>>>> imports in the shared modules. We are doing plenty of this >>> already. And >>>>>>>>> yes >>>>>>>>>> it would require relative links we currently do not allow. >>>>>>>>>> >>>>>>>>>> 2) -> has one disadvantage that someone at some point in time >> will >>> have >>>>>>>>> to >>>>>>>>>> decide to synchronize this and if it happens just before release >>> (I bet >>>>>>>>>> this is going to happen) this will lead to solving problems that >>> would >>>>>>>>>> normally be solved during PR when you make a change (i.e. >> symbolic >>> link >>>>>>>>> has >>>>>>>>>> the advantage that whoever modifies shared code will be >> immediately >>>>>>>>>> notified in their PR - that they broke something because either >>> static >>>>>>>>>> checks or mypy or tests fail. >>>>>>>>>> >>>>>>>>>> Ash, do you have an idea of a process (who and when) does the >>>>>>>>>> synchronisation in case of vendoring? Maybe we could solve it if >>> it is >>>>>>>>> done >>>>>>>>>> more frequently and with some regularity? We could potentially >>> force >>>>>>>>>> re-vendoring at PR time as well any time shared code changes (and >>>>>>> prevent >>>>>>>>>> it by pre-commit. And I can't think of some place (other than >>> releases) >>>>>>>>> in >>>>>>>>>> our development workflow and that seems to be a bit too late as >>> puts an >>>>>>>>>> extra effort on fixing potential incompatibilities introduced on >>>>>>> release >>>>>>>>>> manager and delays the release. WDYT? >>>>>>>>>> >>>>>>>>>> Re: relative links. I think for a shared library we could >>> potentially >>>>>>>>> relax >>>>>>>>>> this and allow them (and actually disallow absolute links in the >>> pieces >>>>>>>>> of >>>>>>>>>> code that are shared - again, by pre-commit). As I recall, the >> only >>>>>>>>> reason >>>>>>>>>> we forbade the relative links is because of how we are (or maybe >>> were) >>>>>>>>>> doing DAG parsing and failures resulting from it. So we decided >> to >>> just >>>>>>>>> not >>>>>>>>>> allow it to keep consistency. The way how Dag parsing works is >> that >>>>>>> when >>>>>>>>>> you are using importlib to read the Dag from a file, the relative >>>>>>> imports >>>>>>>>>> do not work as it does not know what they should be relative to. >>> But if >>>>>>>>>> relative import is done from an imported package, it should be no >>>>>>>>> problem, >>>>>>>>>> I think - otherwise our Dags would not be able to import any >>> library >>>>>>> that >>>>>>>>>> uses relative imports. >>>>>>>>>> >>>>>>>>>> Of course consistency might be the reason why we do not want to >>>>>>> introduce >>>>>>>>>> relative imports. I don't see it as an issue if it is guarded by >>>>>>>>> pre-commit >>>>>>>>>> though. >>>>>>>>>> >>>>>>>>>> J. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> J. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor < >>> a...@apache.org> >>>>>>>>>> napisał: >>>>>>>>>> >>>>>>>>>>> Oh yes, symlinks will work, with one big caveat: It does mean >> you >>>>>>> can’t >>>>>>>>>>> use absolute imports in one common module to another. >>>>>>>>>>> >>>>>>>>>>> For example >>>>>>>>>>> >>>>>>> >>> >> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41 >>>>>>>>>>> where we have >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> from airflow.utils.module_loading import import_string >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> if we want to move serve_logs into this common lib that is then >>>>>>>>> symlinked >>>>>>>>>>> then we wouldn’t be able to have `from >>> airflow_common.module_loading >>>>>>>>> import >>>>>>>>>>> import_string`. >>>>>>>>>>> >>>>>>>>>>> I can think of two possible solutions here. >>>>>>>>>>> >>>>>>>>>>> 1) is to allow/require relative imports in this shared lib, so >>> `from >>>>>>>>>>> .module_loading import import_string` >>>>>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which >> will >>>>>>> handle >>>>>>>>>>> import-rewriting for us. >>>>>>>>>>> >>>>>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so I >>> prepared >>>>>>> a >>>>>>>>>>> minimal POC/demo of what vendoring approach could look like here >>>>>>>>>>> >>>>>>> >>> >> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3 >>>>>>>>>>> Now personally I am more than happy with relative imports, but >>>>>>> generally >>>>>>>>>>> as a project we have avoided them, so I think that limits what >> we >>>>>>> could >>>>>>>>> do >>>>>>>>>>> with a symlink based approach. >>>>>>>>>>> >>>>>>>>>>> -ash >>>>>>>>>>> >>>>>>>>>>> [1] https://github.com/pradyunsg/vendoring >>>>>>>>>>> >>>>>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu < >>>>>>> gopidesupa...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> Thanks Ash >>>>>>>>>>>> >>>>>>>>>>>> Yes agree option 2 would be preferred for me. Making sure we >>> have all >>>>>>>>> the >>>>>>>>>>>> gaurdriles to protect any unwanted behaviour in code sharing >> and >>>>>>>>>>> executing >>>>>>>>>>>> right of tests between the packages. >>>>>>>>>>>> >>>>>>>>>>>> Agree with others, option 2 would be >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai < >>>>>>> amoghdesai....@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for starting this discussion, Ash. >>>>>>>>>>>>> >>>>>>>>>>>>> I would prefer option 2 here with proper tooling to handle the >>> code >>>>>>>>>>>>> duplication at *release* time. >>>>>>>>>>>>> It is best to have a dist that has all it needs in itself. >>>>>>>>>>>>> >>>>>>>>>>>>> Option 1 could very quickly get out of hand and if we decide >> to >>>>>>>>> separate >>>>>>>>>>>>> triggerer / >>>>>>>>>>>>> dag processor / config etc etc as separate packages, back >>> compat is >>>>>>>>>>> going >>>>>>>>>>>>> to be a nightmare >>>>>>>>>>>>> and will bite us harder than we anticipate. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks & Regards, >>>>>>>>>>>>> Amogh Desai >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik < >> kaxiln...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler >>>>>>>>> <j_scheff...@gmx.de.invalid >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is >> rather >>>>>>>>>>> pragmatic >>>>>>>>>>>>>>> and we no not need to cut another package and have less >>> package >>>>>>>>> counts >>>>>>>>>>>>>>> and dependencies. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I remember some time ago I was checking (together with >> Jarek, >>> I am >>>>>>>>> not >>>>>>>>>>>>>>> sure anymore...) if the usage of symlinks would be possible. >>> To >>>>>>> keep >>>>>>>>>>>>> the >>>>>>>>>>>>>>> source in one package but "symlink" it into another. If then >>> at >>>>>>>>> point >>>>>>>>>>>>> of >>>>>>>>>>>>>>> packaging/release the files are materialized we have 1 set >> of >>>>>>> code. >>>>>>>>>>>>>>> Otherwise if not possible still the redundancy could be >>> solved by >>>>>>> a >>>>>>>>>>>>>>> pre-commit hook - and in Git the files are de-duplicated >>> anyway >>>>>>>>> based >>>>>>>>>>>>> on >>>>>>>>>>>>>>> content hash, so this does not hurt. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote: >>>>>>>>>>>>>>>> I support option 2 with proper automation & CI - the >>> reasonings >>>>>>>>>>>>> you've >>>>>>>>>>>>>>>> shown for that make sense to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Shahar >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor < >>> a...@apache.org >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As we work on finishing off the code-level separation of >>> Task >>>>>>> SDK >>>>>>>>>>>>> and >>>>>>>>>>>>>>> Core >>>>>>>>>>>>>>>>> (scheduler etc) we have come across some situations where >> we >>>>>>> would >>>>>>>>>>>>>> like >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> share code between these. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However it’s not as straight forward of “just put it in a >>> common >>>>>>>>>>>>> dist >>>>>>>>>>>>>>> they >>>>>>>>>>>>>>>>> both depend upon” because one of the goals of the Task SDK >>>>>>>>>>>>> separation >>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>> to have 100% complete version independence between the >> two, >>>>>>>>> ideally >>>>>>>>>>>>>>> even if >>>>>>>>>>>>>>>>> they are built into the same image and venv. Most of the >>> reason >>>>>>>>> why >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> isn’t straight forward comes down to backwards >>> compatibility - >>>>>>> if >>>>>>>>> we >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> an change to the common/shared distribution >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We’ve listed the options we have thought about in >>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that >>> covers >>>>>>>>>>>>> some >>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>> things that I don’t want to get in to in this discussion >>> such as >>>>>>>>>>>>>>> possibly >>>>>>>>>>>>>>>>> separating operators and executors out of a single >> provider >>>>>>> dist.) >>>>>>>>>>>>>>>>> To give a concrete example of some code I would like to >>> share >>>>>>>>>>>>>>>>> >>>>>>> >>> >> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py >>>>>>>>>>>>>>>>> — logging config. Another thing we will want to share will >>> be >>>>>>> the >>>>>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration (but >>>>>>> notably: >>>>>>>>>>>>>> only >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> parser class, _not_ the default config values, again, lets >>> not >>>>>>>>> dwell >>>>>>>>>>>>>> on >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> specifics of that) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So to bring the options listed in the issue here for >>> discussion, >>>>>>>>>>>>>> broadly >>>>>>>>>>>>>>>>> speaking there are two high-level approaches: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. A single shared distribution >>>>>>>>>>>>>>>>> 2. No shared package and copy/duplicate code >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the code >>> in one >>>>>>>>>>>>>> place. >>>>>>>>>>>>>>>>> However for me, at least in this specific case of Logging >>> config >>>>>>>>> or >>>>>>>>>>>>>>>>> AirflowConfigParser class is that backwards compatibility >> is >>>>>>> much >>>>>>>>>>>>> much >>>>>>>>>>>>>>>>> harder. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is >> released >>>>>>>>>>>>>>> with/embedded >>>>>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain >> the >>>>>>> right >>>>>>>>>>>>>>> version >>>>>>>>>>>>>>>>> of the logging config and ConfigParser etc). The downside >> is >>>>>>> that >>>>>>>>>>>>>> either >>>>>>>>>>>>>>>>> the code will need to be duplicated in the repo, or better >>> yet >>>>>>> it >>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>> live in a single place in the repo, but some tooling (TBD) >>> will >>>>>>>>>>>>>>>>> automatically handle the duplication, either at commit >>> time, or >>>>>>> my >>>>>>>>>>>>>>>>> preference, at release time. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For this kind of shared “utility” code I am very strongly >>>>>>> leaning >>>>>>>>>>>>>>> towards >>>>>>>>>>>>>>>>> option 2 with automation, as otherwise I think the >> backwards >>>>>>>>>>>>>>> compatibility >>>>>>>>>>>>>>>>> requirements would make it unworkable (very quickly over >>> time >>>>>>> the >>>>>>>>>>>>>>>>> combinations we would have to test would just be >>> unreasonable) >>>>>>>>> and I >>>>>>>>>>>>>>> don’t >>>>>>>>>>>>>>>>> feel confident we can have things as stable as we need to >>> really >>>>>>>>>>>>>> deliver >>>>>>>>>>>>>>>>> the version separation/independency I want to delivery >> with >>>>>>>>> AIP-72. >>>>>>>>>>>>>>>>> So unless someone feels very strongly about this, I will >>> come up >>>>>>>>>>>>> with >>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>> draft PR for further discussion that will implement code >>> sharing >>>>>>>>> via >>>>>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I can >>>>>>> achieve >>>>>>>>>>>>> this >>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>> we have a single version in the repo and it’ll work there, >>> but >>>>>>> at >>>>>>>>>>>>>>> runtime >>>>>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at >> something >>>>>>> like >>>>>>>>>>>>>>>>> `airflow.sdk._vendor` etc. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In terms of repo layout, this likely means we would end up >>> with: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> airflow-core/pyproject.toml >>>>>>>>>>>>>>>>> airflow-core/src/ >>>>>>>>>>>>>>>>> airflow-core/tests/ >>>>>>>>>>>>>>>>> task-sdk/pyproject.toml >>>>>>>>>>>>>>>>> task-sdk/src/ >>>>>>>>>>>>>>>>> task-sdk/tests/ >>>>>>>>>>>>>>>>> airflow-common/src >>>>>>>>>>>>>>>>> airflow-common/tests/ >>>>>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would >>> be >>>>>>>>>>>>> included >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> the downstream projects. TBD. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thoughts and feedback welcomed. >>>>>>>>>>>>>>> >>>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>>>>>>>>> For additional commands, e-mail: >> dev-h...@airflow.apache.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>> >>>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org