Thanks Ash! Looks cool! I like the structure. This will enable all the
combinations and structure looks easy to grasp. No strong stance on the
naming other than maybe it is a bit long with `and`, `core_ctl` could be
shorter, since no import path is defined like that, we can give any name
for sure.

Best regards,

On Mon, 7 Jul 2025, 21:51 Jarek Potiuk, <ja...@potiuk.com> wrote:

> Looks good but I think we should find some better logical name for
> core_and_sdk :)
>
> pon., 7 lip 2025, 21:44 użytkownik Jens Scheffler
> <j_scheff...@gmx.de.invalid> napisał:
>
> > Cool! Especially the "shared" folder with the ability to have
> > N-combinations w/o exploding project repo root!
> >
> > On 07.07.25 14:43, Ash Berlin-Taylor wrote:
> > > Oh, and all of this will be explain in shared/README.md
> > >
> > >> On 7 Jul 2025, at 13:41, Ash Berlin-Taylor <a...@apache.org> wrote:
> > >>
> > >> Okay, so it seems we have agreement on the approach here, so I’ll
> > continue with this, and on the dev call it was mentioned that
> > “airflow-common” wasn’t a great name, so here is my proposal for the file
> > structure;
> > >>
> > >> ```
> > >> /
> > >>   task-sdk/...
> > >>   airflow-core/...
> > >>   shared/
> > >>     kuberenetes/
> > >>       pyproject.toml
> > >>       src/
> > >>         airflow_kube/__init__.py
> > >>     core-and-tasksdk/
> > >>       pyproject.toml
> > >>       src/
> > >>         airflow_shared/__init__.py
> > >> ```
> > >>
> > >> Things to note here: the “shared” folder has (the possibility) of
> > having multiple different shared “libraries” in it, in this example I am
> > supposing a hypothetical shared kuberenetes folder a world in which we
> > split the KubePodOperator and the KubeExecutor in to two separate
> > distributions (example only, not proposing we do that right now, and that
> > will be a separate discussion)
> > >>
> > >> The other things to note here:
> > >>
> > >>
> > >> - the folder name in shared aims to be “self-documenting”, hence the
> > verbose “core-and-tasksdk” to say where the shared library is intended to
> > be used.
> > >> - the python module itself should almost always have an `airflow_` (or
> > maybe `_airflow_`?) prefix so that it does not conflict with anything
> else
> > we might use. It won’t matter “in production” as those will be vendored
> in
> > to be imported as `airflow/_vendor/airflow_shared` etc, but avoiding
> > conflicts at dev time with the Finder approach is a good safety measure.
> > >>
> > >> I will start making a real PR for this proposal now, but I’m open to
> > feedback (either here, or in the PR when I open it)
> > >>
> > >> -ash
> > >>
> > >>> On 4 Jul 2025, at 16:55, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >>>
> > >>> Yeah we have to try it and test - also building packages happens semi
> > >>> frequently when you run `uv sync` (they use some kind of heuristics
> to
> > >>> decide when) and you can force it with `--reinstall` or `--refresh`.
> > >>> Package build also happens every time when you run "ci-image build`
> > now in
> > >>> breeze so it seems like it will nicely integrate in our workflows.
> > >>>
> > >>> Looks really cool Ash.
> > >>>
> > >>> On Fri, Jul 4, 2025 at 5:14 PM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > >>>
> > >>>> It’s not just release time, but any time we build a package which
> > happens
> > >>>> on “every” CI run. The normal unit tests will use code from
> > >>>> airflow-common/src/airflow_common; the kube tests which build an
> > image will
> > >>>> build the dists and vendor in the code from that commit.
> > >>>>
> > >>>> There is only a single copy of the shared code committed to the
> repo,
> > so
> > >>>> there is never anything to synchronise.
> > >>>>
> > >>>>> On 4 Jul 2025, at 15:53, Amogh Desai <amoghdesai....@gmail.com>
> > wrote:
> > >>>>>
> > >>>>> Thanks Ash.
> > >>>>>
> > >>>>> This is really cool and helpful that you were able to test both
> > scenarios
> > >>>>> -- repo checkout
> > >>>>> and also installing from the vendored package and the resolution
> > worked
> > >>>>> fine too.
> > >>>>>
> > >>>>> I like this idea compared the to relative import one for few
> reasons:
> > >>>>> - It feels like it will take some time to adjust to the new coding
> > >>>> standard
> > >>>>> that we will lay
> > >>>>> if we impose relative imports in the shared dist
> > >>>>> - We can continue using repo wise absolute import standards, it is
> > also
> > >>>>> much easier for situations
> > >>>>> when we do global search in IDE to find + replace, this could mean
> a
> > >>>> change
> > >>>>> there
> > >>>>> - The vendoring work is a proven and established paradigm across
> > projects
> > >>>>> and would
> > >>>>> out of box give us the build tooling we need also
> > >>>>>
> > >>>>> Nothing too against the relative import but with the evidence
> > provided
> > >>>>> above, vendored approach
> > >>>>> seems to only do us good.
> > >>>>>
> > >>>>> Regarding synchronizing it, release time should be fine as long as
> we
> > >>>> have
> > >>>>> a good CI workflow to probably
> > >>>>> catch such issues per PR if changes are made in shared dist?
> > (precommit
> > >>>>> would make it really slow i guess)
> > >>>>>
> > >>>>> If we can run our tests with vendored code we should be mostly
> > covered.
> > >>>>>
> > >>>>> Good effort all!
> > >>>>>
> > >>>>> Thanks & Regards,
> > >>>>> Amogh Desai
> > >>>>>
> > >>>>>
> > >>>>>> On Fri, Jul 4, 2025 at 7:23 PM Ash Berlin-Taylor <a...@apache.org>
> > >>>> wrote:
> > >>>>>> Okay, I think I’ve got something that works and I’m happy with.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> >
> https://github.com/astronomer/airflow/tree/shared-vendored-lib-tasksdk-and-core
> > >>>>>> This produces the following from `uv build task-sdk`
> > >>>>>> -
> > >>>>>>
> > >>>>
> >
> https://github.com/user-attachments/files/21058976/apache_airflow_task_sdk-1.1.0.tar.gz
> > >>>>>> -
> > >>>>>>
> > >>>>
> >
> https://github.com/user-attachments/files/21058996/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip
> > >>>>>> (`.whl.zip` as GH won't allow .whl upload, but will .zip)
> > >>>>>>
> > >>>>>> ```
> > >>>>>> ❯ unzip -l
> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl.zip |
> > >>>> grep
> > >>>>>> _vendor
> > >>>>>>      50  02-02-2020 00:00   airflow/sdk/_vendor/.gitignore
> > >>>>>>    2082  02-02-2020 00:00   airflow/sdk/_vendor/__init__.py
> > >>>>>>      28  02-02-2020 00:00   airflow/sdk/_vendor/airflow_common.pyi
> > >>>>>>      18  02-02-2020 00:00   airflow/sdk/_vendor/vendor.txt
> > >>>>>>     785  02-02-2020 00:00
> > >>>>>> airflow/sdk/_vendor/airflow_common/__init__.py
> > >>>>>>   10628  02-02-2020 00:00
> > >>>>>> airflow/sdk/_vendor/airflow_common/timezone.py
> > >>>>>> ```
> > >>>>>>
> > >>>>>> And similarly in the .tar.gz, so our “sdist” is complete too:
> > >>>>>> ```
> > >>>>>> ❯ tar -tzf dist/apache_airflow_task_sdk-1.1.0.tar.gz |grep _vendor
> > >>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/.gitignore
> > >>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/__init__.py
> > >>>>>>
> > apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common.pyi
> > >>>>>> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/vendor.txt
> > >>>>>>
> > >>>>>>
> > >>>>
> >
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/__init__.py
> > >>>>>>
> > >>>>
> >
> apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_vendor/airflow_common/timezone.py
> > >>>>>> ```
> > >>>>>>
> > >>>>>> The plugin works at build time by including/copying the libs
> > specified
> > >>>> in
> > >>>>>> vendor.txt into place (and let `vendoring` take care of import
> > >>>> rewrites.)
> > >>>>>> For the imports to continue to work at “dev” time/from a repo
> > checkout,
> > >>>> I
> > >>>>>> have added a import finder to `sys.meta_path`, and since it’s at
> the
> > >>>> end of
> > >>>>>> the list it will only be used if the normal import can’t find
> > things.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> >
> https://github.com/astronomer/airflow/blob/996817782be6071b306a87af9f36fe1cf2d3aaa3/task-sdk/src/airflow/sdk/_vendor/__init__.py
> > >>>>>> This doesn’t quite give us the same runtime effect “import
> > rewriting”
> > >>>>>> affect, as in this approach `airflow_common` is directly loaded
> > (i.e.
> > >>>>>> airflow.sdk._vendor.airflow_common and airflow_common exist in
> > >>>>>> sys.modules), but it does work for everything that I was able to
> > test..
> > >>>>>>
> > >>>>>> I tested it with the diff at the end of this message. My test
> > ipython
> > >>>>>> shell:
> > >>>>>>
> > >>>>>> ```
> > >>>>>> In [1]: from airflow.sdk._vendor.airflow_common.timezone import
> foo
> > >>>>>>
> > >>>>>> In [2]: foo
> > >>>>>> Out[2]: 1
> > >>>>>>
> > >>>>>> In [3]: import airflow.sdk._vendor.airflow_common
> > >>>>>>
> > >>>>>> In [4]: import airflow.sdk._vendor.airflow_common.timezone
> > >>>>>>
> > >>>>>> In [5]: airflow.sdk._vendor.airflow_common.__file__
> > >>>>>> Out[5]:
> > >>>>>>
> > >>>>
> >
> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/__init__.py'
> > >>>>>> In [6]: airflow.sdk._vendor.airflow_common.timezone.__file__
> > >>>>>> Out[6]:
> > >>>>>>
> > >>>>
> >
> '/Users/ash/code/airflow/airflow/airflow-common/src/airflow_common/timezone.py'
> > >>>>>> ```
> > >>>>>>
> > >>>>>>
> > >>>>>> And in an standalone environment with the SDK dist I built (it
> > needed
> > >>>> the
> > >>>>>> matching airflow-core right now, but that is nothing to do with
> this
> > >>>>>> discussion):
> > >>>>>>
> > >>>>>> ```
> > >>>>>> ❯ _AIRFLOW__AS_LIBRARY=1 uvx --python 3.12 --with
> > >>>>>> dist/apache_airflow_core-3.1.0-py3-none-any.whl --with
> > >>>>>> dist/apache_airflow_task_sdk-1.1.0-py3-none-any.whl ipython
> > >>>>>> Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ]
> > >>>>>> Type 'copyright', 'credits' or 'license' for more information
> > >>>>>> IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for
> help.
> > >>>>>> Tip: You can use `%hist` to view history, see the options with
> > >>>> `%history?`
> > >>>>>> In [1]: import airflow.sdk._vendor.airflow_common.timezone
> > >>>>>>
> > >>>>>> In [2]: airflow.sdk._vendor.airflow_common.timezone.__file__
> > >>>>>> Out[2]:
> > >>>>>>
> > >>>>
> >
> '/Users/ash/.cache/uv/archive-v0/WWq6r65aPto2eJOyPObEH/lib/python3.12/site-packages/airflow/sdk/_vendor/airflow_common/timezone.py’
> > >>>>>> ``
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> ```diff
> > >>>>>> diff --git a/airflow-common/src/airflow_common/__init__.py
> > >>>>>> b/airflow-common/src/airflow_common/__init__.py
> > >>>>>> index 13a83393a9..927b7c6b61 100644
> > >>>>>> --- a/airflow-common/src/airflow_common/__init__.py
> > >>>>>> +++ b/airflow-common/src/airflow_common/__init__.py
> > >>>>>> @@ -14,3 +14,5 @@
> > >>>>>> # KIND, either express or implied.  See the License for the
> > >>>>>> # specific language governing permissions and limitations
> > >>>>>> # under the License.
> > >>>>>> +
> > >>>>>> +foo = 1
> > >>>>>> diff --git a/airflow-common/src/airflow_common/timezone.py
> > >>>>>> b/airflow-common/src/airflow_common/timezone.py
> > >>>>>> index 340b924c66..58384ef20f 100644
> > >>>>>> --- a/airflow-common/src/airflow_common/timezone.py
> > >>>>>> +++ b/airflow-common/src/airflow_common/timezone.py
> > >>>>>> @@ -36,6 +36,9 @@ _PENDULUM3 =
> > >>>>>> version.parse(metadata.version("pendulum")).major == 3
> > >>>>>> # - FixedTimezone(0, "UTC") in pendulum 2
> > >>>>>> utc = pendulum.UTC
> > >>>>>>
> > >>>>>> +
> > >>>>>> +from airflow_common import foo
> > >>>>>> +
> > >>>>>> TIMEZONE: Timezone
> > >>>>>>
> > >>>>>>
> > >>>>>> ```
> > >>>>>>
> > >>>>>>>> On 3 Jul 2025, at 12:43, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >>>>>>> I think both approaches are doable:
> > >>>>>>>
> > >>>>>>> 1) -> We can very easily prevent bad imports by pre-commit when
> > >>>> importing
> > >>>>>>> from different distributions and make sure we are only doing
> > relative
> > >>>>>>> imports in the shared modules. We are doing plenty of this
> > already. And
> > >>>>>> yes
> > >>>>>>> it would require relative links we currently do not allow.
> > >>>>>>>
> > >>>>>>> 2) -> has one disadvantage that someone at some point in time
> will
> > have
> > >>>>>> to
> > >>>>>>> decide to synchronize this and if it happens just before release
> > (I bet
> > >>>>>>> this is going to happen) this will lead to solving problems that
> > would
> > >>>>>>> normally be solved during PR when you make a change (i.e.
> symbolic
> > link
> > >>>>>> has
> > >>>>>>> the advantage that whoever modifies shared code will be
> immediately
> > >>>>>>> notified in their PR - that they broke something because either
> > static
> > >>>>>>> checks or mypy or tests fail.
> > >>>>>>>
> > >>>>>>> Ash, do you have an idea of a process (who and when) does the
> > >>>>>>> synchronisation in case of vendoring? Maybe we could solve it if
> > it is
> > >>>>>> done
> > >>>>>>> more frequently and with some regularity? We could potentially
> > force
> > >>>>>>> re-vendoring at PR time as well any time shared code changes (and
> > >>>> prevent
> > >>>>>>> it by pre-commit. And I can't think of some place (other than
> > releases)
> > >>>>>> in
> > >>>>>>> our development workflow and that seems to be a bit too late as
> > puts an
> > >>>>>>> extra effort on fixing potential incompatibilities introduced on
> > >>>> release
> > >>>>>>> manager and delays the release. WDYT?
> > >>>>>>>
> > >>>>>>> Re: relative links. I think for a shared library we could
> > potentially
> > >>>>>> relax
> > >>>>>>> this and allow them (and actually disallow absolute links in the
> > pieces
> > >>>>>> of
> > >>>>>>> code that are shared - again, by pre-commit). As I recall, the
> only
> > >>>>>> reason
> > >>>>>>> we forbade the relative links is because of how we are (or maybe
> > were)
> > >>>>>>> doing DAG parsing and failures resulting from it. So we decided
> to
> > just
> > >>>>>> not
> > >>>>>>> allow it to keep consistency. The way how Dag parsing works is
> that
> > >>>> when
> > >>>>>>> you are using importlib to read the Dag from a file, the relative
> > >>>> imports
> > >>>>>>> do not work as it does not know what they should be relative to.
> > But if
> > >>>>>>> relative import is done from an imported package, it should be no
> > >>>>>> problem,
> > >>>>>>> I think - otherwise our Dags would not be able to import any
> > library
> > >>>> that
> > >>>>>>> uses relative imports.
> > >>>>>>>
> > >>>>>>> Of course consistency might be the reason why we do not want to
> > >>>> introduce
> > >>>>>>> relative imports. I don't see it as an issue if it is guarded by
> > >>>>>> pre-commit
> > >>>>>>> though.
> > >>>>>>>
> > >>>>>>> J.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> J.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> czw., 3 lip 2025, 12:11 użytkownik Ash Berlin-Taylor <
> > a...@apache.org>
> > >>>>>>> napisał:
> > >>>>>>>
> > >>>>>>>> Oh yes, symlinks will work, with one big caveat: It does mean
> you
> > >>>> can’t
> > >>>>>>>> use absolute imports in one common module to another.
> > >>>>>>>>
> > >>>>>>>> For example
> > >>>>>>>>
> > >>>>
> >
> https://github.com/apache/airflow/blob/4c66ebd06/airflow-core/src/airflow/utils/serve_logs.py#L41
> > >>>>>>>> where we have
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> from airflow.utils.module_loading import import_string
> > >>>>>>>> ```
> > >>>>>>>>
> > >>>>>>>> if we want to move serve_logs into this common lib that is then
> > >>>>>> symlinked
> > >>>>>>>> then we wouldn’t be able to have `from
> > airflow_common.module_loading
> > >>>>>> import
> > >>>>>>>> import_string`.
> > >>>>>>>>
> > >>>>>>>> I can think of two possible solutions here.
> > >>>>>>>>
> > >>>>>>>> 1) is to allow/require relative imports in this shared lib, so
> > `from
> > >>>>>>>> .module_loading import import_string`
> > >>>>>>>> 2) is to use `vendoring`[1] (from the pip maintainers) which
> will
> > >>>> handle
> > >>>>>>>> import-rewriting for us.
> > >>>>>>>>
> > >>>>>>>> I’d entirely forgot that symlinks in repos was a thing, so I
> > prepared
> > >>>> a
> > >>>>>>>> minimal POC/demo of what vendoring approach could look like here
> > >>>>>>>>
> > >>>>
> >
> https://github.com/apache/airflow/commit/996817782be6071b306a87af9f36fe1cf2d3aaa3
> > >>>>>>>> Now personally I am more than happy with relative imports, but
> > >>>> generally
> > >>>>>>>> as a project we have avoided them, so I think that limits what
> we
> > >>>> could
> > >>>>>> do
> > >>>>>>>> with a symlink based approach.
> > >>>>>>>>
> > >>>>>>>> -ash
> > >>>>>>>>
> > >>>>>>>> [1] https://github.com/pradyunsg/vendoring
> > >>>>>>>>
> > >>>>>>>>> On 3 Jul 2025, at 10:30, Pavankumar Gopidesu <
> > >>>> gopidesupa...@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>> Thanks Ash
> > >>>>>>>>>
> > >>>>>>>>> Yes agree option 2 would be preferred for me. Making sure we
> > have all
> > >>>>>> the
> > >>>>>>>>> gaurdriles to protect any unwanted behaviour in code sharing
> and
> > >>>>>>>> executing
> > >>>>>>>>> right of tests between the packages.
> > >>>>>>>>>
> > >>>>>>>>> Agree with others, option 2 would be
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Jul 3, 2025 at 10:02 AM Amogh Desai <
> > >>>> amoghdesai....@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Thanks for starting this discussion, Ash.
> > >>>>>>>>>>
> > >>>>>>>>>> I would prefer option 2 here with proper tooling to handle the
> > code
> > >>>>>>>>>> duplication at *release* time.
> > >>>>>>>>>> It is best to have a dist that has all it needs in itself.
> > >>>>>>>>>>
> > >>>>>>>>>> Option 1 could very quickly get out of hand and if we decide
> to
> > >>>>>> separate
> > >>>>>>>>>> triggerer /
> > >>>>>>>>>> dag processor / config etc etc as separate packages, back
> > compat is
> > >>>>>>>> going
> > >>>>>>>>>> to be a nightmare
> > >>>>>>>>>> and will bite us harder than we anticipate.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks & Regards,
> > >>>>>>>>>> Amogh Desai
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Jul 3, 2025 at 1:12 AM Kaxil Naik <
> kaxiln...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>>> I prefer Option 2 as well to avoid matrix of dependencies
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, 3 Jul 2025 at 01:03, Jens Scheffler
> > >>>>>> <j_scheff...@gmx.de.invalid
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I'd also rather prefer option 2 - reason here is it is
> rather
> > >>>>>>>> pragmatic
> > >>>>>>>>>>>> and we no not need to cut another package and have less
> > package
> > >>>>>> counts
> > >>>>>>>>>>>> and dependencies.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I remember some time ago I was checking (together with
> Jarek,
> > I am
> > >>>>>> not
> > >>>>>>>>>>>> sure anymore...) if the usage of symlinks would be possible.
> > To
> > >>>> keep
> > >>>>>>>>>> the
> > >>>>>>>>>>>> source in one package but "symlink" it into another. If then
> > at
> > >>>>>> point
> > >>>>>>>>>> of
> > >>>>>>>>>>>> packaging/release the files are materialized we have 1 set
> of
> > >>>> code.
> > >>>>>>>>>>>> Otherwise if not possible still the redundancy could be
> > solved by
> > >>>> a
> > >>>>>>>>>>>> pre-commit hook - and in Git the files are de-duplicated
> > anyway
> > >>>>>> based
> > >>>>>>>>>> on
> > >>>>>>>>>>>> content hash, so this does not hurt.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 02.07.25 18:49, Shahar Epstein wrote:
> > >>>>>>>>>>>>> I support option 2 with proper automation & CI - the
> > reasonings
> > >>>>>>>>>> you've
> > >>>>>>>>>>>>> shown for that make sense to me.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Shahar
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <
> > a...@apache.org
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>> Hello everyone,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> As we work on finishing off the code-level separation of
> > Task
> > >>>> SDK
> > >>>>>>>>>> and
> > >>>>>>>>>>>> Core
> > >>>>>>>>>>>>>> (scheduler etc) we have come across some situations where
> we
> > >>>> would
> > >>>>>>>>>>> like
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>> share code between these.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> However it’s not as straight forward of “just put it in a
> > common
> > >>>>>>>>>> dist
> > >>>>>>>>>>>> they
> > >>>>>>>>>>>>>> both depend upon” because one of the goals of the Task SDK
> > >>>>>>>>>> separation
> > >>>>>>>>>>>> was
> > >>>>>>>>>>>>>> to have 100% complete version independence between the
> two,
> > >>>>>> ideally
> > >>>>>>>>>>>> even if
> > >>>>>>>>>>>>>> they are built into the same image and venv. Most of the
> > reason
> > >>>>>> why
> > >>>>>>>>>>> this
> > >>>>>>>>>>>>>> isn’t straight forward comes down to backwards
> > compatibility -
> > >>>> if
> > >>>>>> we
> > >>>>>>>>>>>> make
> > >>>>>>>>>>>>>> an change to the common/shared distribution
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> We’ve listed the options we have thought about in
> > >>>>>>>>>>>>>> https://github.com/apache/airflow/issues/51545 (but that
> > covers
> > >>>>>>>>>> some
> > >>>>>>>>>>>> more
> > >>>>>>>>>>>>>> things that I don’t want to get in to in this discussion
> > such as
> > >>>>>>>>>>>> possibly
> > >>>>>>>>>>>>>> separating operators and executors out of a single
> provider
> > >>>> dist.)
> > >>>>>>>>>>>>>> To give a concrete example of some code I would like to
> > share
> > >>>>>>>>>>>>>>
> > >>>>
> >
> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
> > >>>>>>>>>>>>>> — logging config. Another thing we will want to share will
> > be
> > >>>> the
> > >>>>>>>>>>>>>> AirflowConfigParser class from airflow.configuration (but
> > >>>> notably:
> > >>>>>>>>>>> only
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> parser class, _not_ the default config values, again, lets
> > not
> > >>>>>> dwell
> > >>>>>>>>>>> on
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> specifics of that)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> So to bring the options listed in the issue here for
> > discussion,
> > >>>>>>>>>>> broadly
> > >>>>>>>>>>>>>> speaking there are two high-level approaches:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 1. A single shared distribution
> > >>>>>>>>>>>>>> 2. No shared package and copy/duplicate code
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The advantage of Approach 1 is that we only have the code
> > in one
> > >>>>>>>>>>> place.
> > >>>>>>>>>>>>>> However for me, at least in this specific case of Logging
> > config
> > >>>>>> or
> > >>>>>>>>>>>>>> AirflowConfigParser class is that backwards compatibility
> is
> > >>>> much
> > >>>>>>>>>> much
> > >>>>>>>>>>>>>> harder.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The main advantage of Approach 2 is the the code is
> released
> > >>>>>>>>>>>> with/embedded
> > >>>>>>>>>>>>>> in the dist (i.e. apache-airflow-task-sdk would contain
> the
> > >>>> right
> > >>>>>>>>>>>> version
> > >>>>>>>>>>>>>> of the logging config and ConfigParser etc). The downside
> is
> > >>>> that
> > >>>>>>>>>>> either
> > >>>>>>>>>>>>>> the code will need to be duplicated in the repo, or better
> > yet
> > >>>> it
> > >>>>>>>>>>> would
> > >>>>>>>>>>>>>> live in a single place in the repo, but some tooling (TBD)
> > will
> > >>>>>>>>>>>>>> automatically handle the duplication, either at commit
> > time, or
> > >>>> my
> > >>>>>>>>>>>>>> preference, at release time.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> For this kind of shared “utility” code I am very strongly
> > >>>> leaning
> > >>>>>>>>>>>> towards
> > >>>>>>>>>>>>>> option 2 with automation, as otherwise I think the
> backwards
> > >>>>>>>>>>>> compatibility
> > >>>>>>>>>>>>>> requirements would make it unworkable (very quickly over
> > time
> > >>>> the
> > >>>>>>>>>>>>>> combinations we would have to test would just be
> > unreasonable)
> > >>>>>> and I
> > >>>>>>>>>>>> don’t
> > >>>>>>>>>>>>>> feel confident we can have things as stable as we need to
> > really
> > >>>>>>>>>>> deliver
> > >>>>>>>>>>>>>> the version separation/independency I want to delivery
> with
> > >>>>>> AIP-72.
> > >>>>>>>>>>>>>> So unless someone feels very strongly about this, I will
> > come up
> > >>>>>>>>>> with
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>> draft PR for further discussion that will implement code
> > sharing
> > >>>>>> via
> > >>>>>>>>>>>>>> “vendoring” it at build time. I have an idea of how I can
> > >>>> achieve
> > >>>>>>>>>> this
> > >>>>>>>>>>>> so
> > >>>>>>>>>>>>>> we have a single version in the repo and it’ll work there,
> > but
> > >>>> at
> > >>>>>>>>>>>> runtime
> > >>>>>>>>>>>>>> we vendor it in to the shipped dist so it lives at
> something
> > >>>> like
> > >>>>>>>>>>>>>> `airflow.sdk._vendor` etc.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> In terms of repo layout, this likely means we would end up
> > with:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> airflow-core/pyproject.toml
> > >>>>>>>>>>>>>> airflow-core/src/
> > >>>>>>>>>>>>>> airflow-core/tests/
> > >>>>>>>>>>>>>> task-sdk/pyproject.toml
> > >>>>>>>>>>>>>> task-sdk/src/
> > >>>>>>>>>>>>>> task-sdk/tests/
> > >>>>>>>>>>>>>> airflow-common/src
> > >>>>>>>>>>>>>> airflow-common/tests/
> > >>>>>>>>>>>>>> # Possibly no airflow-common/pyproject.toml, as deps would
> > be
> > >>>>>>>>>> included
> > >>>>>>>>>>>> in
> > >>>>>>>>>>>>>> the downstream projects. TBD.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thoughts and feedback welcomed.
> > >>>>>>>>>>>>
> > >>>>>>
> > ---------------------------------------------------------------------
> > >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >>>>>>>>>>>> For additional commands, e-mail:
> dev-h...@airflow.apache.org
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>>>
> > >>>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to