potiuk commented on PR #26162:
URL: https://github.com/apache/airflow/pull/26162#issuecomment-1238159017
Wrong box :).
Both of the examples you mentioned @Taragolis are clearly AWS ones they need
no fix as they are in the right place.
They are (and should be) in AWS. In Both cases the "USER" of the
authentication (AWS) has dependency on the functionality they use. Having such
a functionality for optional AWS <> Google dependency is perfectly fine and it
is even "blessed" in our package system. the amazon provider has [google]
extra and google provider has [amazon] extra. We even have an exception
specially foreseen for that - not yet heavily used but once we split providers
into separate repos I was planning to consistently apply it in a places that do
not have it.
```
class AirflowOptionalProviderFeatureException(AirflowException):
"""Raise by providers when imports are missing for optional provider
features."""
```
Those two stories are different thatn here is very easy (even if not
"technical" - this is more looking at the landscape of our providers from the
business side of things than pure interface/API and it is more based on
likelihood of being better maintained than anything else). The decision where
to put so code that is in-between is bound more to whether there are clear
stakeholders behind the provider that mostly maintain the provider and are
interested in having this functionality in.
It is captured in a few places already. We have [Release process for
providers](https://github.com/apache/airflow/blob/main/README.md#release-process-for-providers)
but also we already used the same line of thoughts when we decided where to
put transfer operators. See AIP-21 [Changes in import
paths](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths)
- and also touched upon in AIP-8 - [Split Provider packages for Airflow
2.0](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-8+Split+Providers+into+Separate+Packages+for+Airflow+2.0).
One of the decisions we made there is that the the "transfer" operators are
put in the "target" of the transfer when there is a clear stakeholder behind
the target. And in this case circular references are unavoidable.
To sum up the linne of thoughts in one sentence: the idea is that the code
should be put in the "provider" where there is a stakeholder that is most
likely to maintain the code.
An 'Postgres` and `MySQL` are different. They have no clear stakeholders
behind that are interested in maintaining those. Even though they are
commercial databases with companies behind, they are 'commodity' APIS and they
are not interested in maintaining services. And neither Postgres nor MySQL are
insterested in adding interfacing with AWS/GCP to the provider. But both AWS
and GCP are respectively interested to allow AWS/GCP authentication WITH the
Postgres/MySQL provider they expose in their own Services offering.
This has been discussed for a long time when AIP-21 was discussed - there
are different kinds of providers. Simply speaking SAML/GSSAPI are "protocols",
there are also "databases" (like Postgres/MySQL) and then there are "Services"
which are higher layer of abstraction and while "Services" can use "Protocols"
and "Databases", neither "Protocols" nor "Databases" should use "Services" -
they can only be "used" by services.
This will be much more visible and obvious when we split providers to
separate repos. The ideal situation should be, that people who are from AWS
should be subscribed to that single "amazon" provider repo to get notifications
about all the AWS-specific changes they are interested in. And this
authentication code clearly falls into this "bucket".
The Google Federated identity code is also a very good example here - while
it is clearly GCP-related, this is AWS people who are interested to get it
working and make sure they can connect to GCP (for example to be able ot Pull
data from it etc.). They will be maintaining the code, not the Google people.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]