Question though... I'm not sure we need to mark anything in the subclasses
(K8sExecutor, LocalExecutor etc) as public... how would we choose what to
make public?  Why would we do so?  Why not just keep it simple and say that
BaseExecutor is the public interface and that's what's public and the
individual executors have backcompat with regard to their external behavior
but not with their code structure necessarily....

And I'll take on the docs change.

On Fri, Jan 27, 2023 at 8:21 AM Ferruzzi, Dennis
<ferru...@amazon.com.invalid> wrote:

> > - We may not want to make a strong statement like "*all executors are
> public*". It's just
>
> > impossible IMHO.
> > - Let's just mark a limited number of key methods/interfaces in each
> executor as public,
> > and we ensure a strong backcompat for them. For any methods/interfaces
> not marked, no
> > backcompat guaranteed even between minor versions. This should also be
> added into
> > documentation, as well as highlighted in the executor docstrings.
>
> If you missed something, then so did I.   I think that is an accurate
> summary.
>
> ------------------------------
> *From:* Xiaodong Deng <xdd...@apache.org>
> *Sent:* Thursday, January 26, 2023 9:13 PM
> *To:* dev@airflow.apache.org
> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking change
> for Airflow (SemVer context)
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Happy to see the change I made was used as an example here ;-)
>
> Honestly the discussion is a bit long and getting increasingly general, so
> a bit hard to follow for me. But if I understand the conversation & issue
> correctly, here are my thoughts:
>
> - We may not want to make a strong statement like "*all executors are
> public*". It's just impossible IMHO.
> - Let's just mark a limited number of key methods/interfaces in each
> executor as public, and we ensure a strong backcompat for them. For any
> methods/interfaces not marked, no backcompat guaranteed even between minor
> versions. This should also be added into documentation, as well as
> highlighted in the executor docstrings.
>
> I believe I must have missed some important points in the conversation, or
> even misunderstood some points. But just summarizing what I have understood
> as well as what I would prefer.
>
>
> Regards,
> XD
>
> On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
>
>> I understand it's "convenient" if they want to extend k8s executor.  But
>> I guess my thought is, I'm not sure that convenience outweighs the
>> competing good which is, freedom to develop k8s executor.  I'm not saying
>> don't extend k8s executor -- please do, and contribute it back.  But for
>> subclassing / extending, that feels more like "you're on your own", in the
>> sense of, "just vendor the executor".
>>
>> For example in a recent PR, XD added better support for running k8s
>> executor with multiple namespace.  Now instead of a single `watcher` on k8s
>> executor we have a `watchers` dict
>> <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>.
>> I suppose it's "technically" breaking, in a subclass sense.  I don't really
>> think that's the kind of backcompat we should worry about when making
>> changes to k8s executor.
>>
>> Certainly the user-facing executor *behavior *should follow backcompat
>> guarantees.  For example we can't just change the name of `pod_override`
>> (in executor config) since that would break user workflows.  Similar, with
>> a conf setting... renaming it without backcompat etc.  Where the dag writer
>> "feels" the executor, it should respect semver, certainly.  But I'm not
>> sure we need to guarantee backcompat of its internal methods.
>>
>> Another real example: Jed right now is trying to rename pod_id to
>> pod_name, a sensible cleanup.  But k8s internals are "public", we can't
>> really do that.  And we could imagine lots of grey areas.  Thought
>> experiment: maybe we want to move some method call out of `sync` and run it
>> periodically in a thread.  That would change what `sync` does and if a
>> subclass depends on that, well I guess we've broken that subclass.
>>
>> WDYT?
>>
>>
>> On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Yeah. The PR was mostly to bring things together- i.e to be able to
>>> say "this is really a public interface of ours". And by just doing it,
>>> I was actually hoping it will spark those kind of discussions where we
>>> will really detail on what it means for each of those to be public.
>>>
>>> Having it written down and part of our docs makes it easy to spark
>>> such discussion by pointing to it and asking "do we really mean that
>>> or that?"
>>>
>>> I think once we make a few passes and polish it even more and get to
>>> 2.6 release when this documentation part will be first time published,
>>> we **could** vote on it before (or maybe lazy-consent if we won't see
>>> many doubts raised).
>>>
>>> Whether it's an AIP - I don't think so, It's mostly about things that
>>> are there that our "intentions" are as public. It's not a legally
>>> binding document (I don't think anyone can make a legally binding
>>> backwards compatibility statement), it's merely a statement of intent
>>> of the community that our users can treat like something that they can
>>> build on when extending Airflow. And understand the rules and
>>> expectations when they look for details of each of the "components" we
>>> publish.
>>>
>>> Coming back to the original question:
>>>
>>> I personally don't think we should limit just Base Executor interface
>>> - because I can imagine there will be enough cases where extending
>>> existing Kubernetes Executor will be far easier than just using the
>>> interface and our users should understand some of the expectations
>>> they might have when doing so.
>>>
>>> One of the ways we could make our intentions clear there is to
>>> describe and follow the Python rules for what's internal and what's
>>> public. (under or even dunder in front of the class/method name). This
>>> is not "entirely" clear (protected vs private is somewhat blurry
>>> here). But maybe we could agree on some rules that we will apply in
>>> all such "public" interfaces of ours. We could do it with the executor
>>> classes and methods that we do not "intend" to be extended and rename
>>> them with the under or dunder in front)
>>>
>>> Also It can be implemented in one of two ways:
>>>
>>> 1) mark the methods/classes that we consider as "changeable" via
>>> adding dunder in front and describe it to the users so that they can
>>> understand what the rules are and make the right decisions
>>> 2) make it a bit more formal, automate and easier to use by the users
>>> in automated way - just follow what we have done with "common.sql"
>>> where we implemented a process to generate and control generated .pyi
>>> stubs:
>>>
>>> Pre-commit that takes care of it:
>>>
>>> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py
>>> Example generated stub file:
>>>
>>> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi
>>>
>>> Recently, we've gone through some first "teething" problems with the
>>> "stubgen" approach and while it is not perfect, we managed to bend it
>>> a bit to our expectations even if the original stubgen generator had
>>> some flaws. And I think we would be able to generalise the common.sql
>>> approach to all kinds of public interfaces we have.
>>>
>>> One problem with the stub approach is that it's hard to distinguish
>>> "internal" vs. "external" use of some of the interfaces, However in
>>> our case, rather than generate the .pyi files and storing it in the
>>> code of airflow, we could simply generate and publish the "typeshed"
>>> interface for Apache Airflow which will ONLY contain the "externally
>>> public" interfaces (https://github.com/python/typeshed) . What
>>> typeshed is (from the typeshed github):
>>>
>>> ""
>>> Typeshed contains external type annotations for the Python standard
>>> library and Python builtins, as well as third party packages as
>>> contributed by people external to those projects.
>>> This data can e.g. be used for static analysis, type checking or type
>>> inference.
>>> """
>>>
>>> This way a user who wishes to extend airflow could simply install the
>>> `types-apache-airflow==2.6.0`  and this will give a very strong base
>>> on understanding what is / is not public  - usable in automated way
>>>
>>> I think it would not be complex to generate such a typeshed package if
>>> we go that direction and it would help us also to protect from
>>> accidental changes (same as we do in common.sql - by automating the
>>> part where we check if the generated interface has not changed in
>>> backwards-incompatible way).
>>>
>>> J.
>>>
>>> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish
>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>> >
>>> > Following up here... that PR has been merged.... At some point we
>>> should probably have a vote on that, if it's meant to be actual binding
>>> policy.  Maybe we're still feeling it out?  But "what is public" is a
>>> pretty fundamental concern to the project.  Maybe such a policy itself
>>> should be an AIP?
>>> >
>>> > Meanwhile, going down into the weeds a bit, there's one aspect of the
>>> doc which came up here:
>>> https://github.com/apache/airflow/pull/29147#discussion_r1086214750
>>> >
>>> > It quotes the "policy":
>>> >
>>> > > Airflow has a set of Executors that are considered public. You are
>>> free to extend their functionality
>>> >
>>> > Do we really mean that all executors are public?  Or do we just mean
>>> that the executor interface is public, i.e. BaseExecutor?
>>> >
>>> > It's of course better for the healthy development of our built-in
>>> executors if we have no backcompat guarantees and can change them as
>>> needed.  But yes this means that anyone else "building on" our executors
>>> would be wise to "vendor in" our executor code instead of just subclassing.
>>> Maybe it's a reasonable assumption that any user with the sophistication
>>> and, perhaps, chutzpah to customize one of our executors, has also the
>>> sophistication to manage this.
>>> >
>>> > What do y'all think?
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>

Reply via email to