potiuk commented on a change in pull request #12466:
URL: https://github.com/apache/airflow/pull/12466#discussion_r527272497
##########
File path: airflow/provider.yaml.schema.json
##########
@@ -17,6 +17,11 @@
"type": "string"
}
},
+ "provider-package": {
Review comment:
Thanks!
TL;DR; I think entrypoint idea for runtime optimization is really cool,
happy to incorporate it as long as we keep "developer" experience intact (or
even optimized in this case as well).
> Yes, I firmly believe this should be the case. By not having third party
packages live under the airflow. namespace, it becomes much much easier for Us
and hopefully users to know what is an official provider, and thus supported by
us vs not.
1. Do you think we should prreprevent or enforce 3rd parties from using
airflow.* ? I do not think there is a way of doing it unless we sign at least
some of the package cryptographically with our own case? Or is it just
convention you propose here? That other packages SHOULD NOT use "airflow" as
package by convention?
> That way any package can register a provider, by adding something like
this to it's setup.cfg:
[options.entry_points]
apache-airflow-provider=
x=my_company.provider.x:register_provider
That's a cool idea, especially that rather than returning a hard-coded
dictionary, we can return the very same `provider.yaml` file, as a dictionary.
That will make it super-easy for anyone writing their own provider as they will
have the .json.schema that will define the expected structure of the file and
it will be very easy to explain it. Very standard, supported by most IDEs with
auto-completion and documentation (I used it to add the current files) and it
can be used to verify such provider information during provider development.
This might help to speed up the runtime discovery - One of the problems with
`walk_packages` is that it is a bit slow. So not having to use it at runtime is
good. And I am happy to change the mechanism for runtime for that, as long (as
discussed on slack) as we solve the problem of local provider development. We
need to still support testing and developing airflow providers directly from
sources in checked out airflow sources. Having to install each provider as a
package in order to test it sounds like a terrible idea for people who develop
providers (which are most of the casual contributors to airflow).
If for runtime we can switch to entry_points, that makes it possible also to
optimize the - directly-from-sources case. This is then much easier because we
can rely on packages being in the right folder and we can use simple directory
walking to find all provider files, which makes it considerably faster. Paired
with the runtime "entrypoint' discovery I think we can have a very optimized
solution keeping all the nice properties we have now: keeping provider
"meta-data" in one place, with validated json.schema yaml file in "airflow"
providers. But if someone wants, they can still keep them as dict objects or
json in 3rd-party providers as long as they follow the schema. I don't think we
need to force anyone to yaml for that even if it is super-convenient to us and
we use it also for document generation.
Unless you have other, better idea to keep the developer experience intact.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]