potiuk commented on a change in pull request #12466:
URL: https://github.com/apache/airflow/pull/12466#discussion_r527272497



##########
File path: airflow/provider.yaml.schema.json
##########
@@ -17,6 +17,11 @@
         "type": "string"
       }
     },
+    "provider-package": {

Review comment:
       Thanks! 
   
   TL;DR; I think entrypoint idea for runtime optimization is really cool, 
happy to incorporate it as long as we keep "developer" experience intact (or 
even optimized in this case as well).
   
   > Yes, I firmly believe this should be the case. By not having third party 
packages live under the airflow. namespace, it becomes much much easier for Us 
and hopefully users to know what is an official provider, and thus supported by 
us vs not.
   
   1. Do you think we should prreprevent or enforce 3rd parties from using 
airflow.* ? I do not think there is a way of doing it unless we sign at least 
some of the package cryptographically with our own case? Or is it just 
convention you propose here? That other packages SHOULD NOT use "airflow" as 
package by convention? 
   
   > That way any package can register a provider, by adding something like 
this to it's setup.cfg:
   
   [options.entry_points]
   apache-airflow-provider=
       x=my_company.provider.x:register_provider
   
   That's a cool idea, especially that rather than returning a hard-coded 
dictionary, we can return the very same `provider.yaml` file, as a dictionary. 
That will make it super-easy for anyone writing their own provider as they will 
have the .json.schema that will define the expected structure of the file and 
it will be very easy to explain it. Very standard, supported by most IDEs with 
auto-completion and documentation (I used it to add the current files) and it 
can be used to verify such provider information during provider development.
   
   This might help to speed up the runtime discovery - One of the problems with 
`walk_packages` is that it is a bit slow. So not having to use it at runtime is 
good. And I am happy to change the mechanism for runtime for that, as long (as 
discussed on slack) as we solve the problem of local provider development. We 
need to still support testing and developing airflow providers directly from 
sources in checked out airflow sources. Having to install each provider as a 
package in order to test it sounds like a terrible idea for people who develop 
providers (which are most of the casual contributors to airflow). 
   
   If for runtime we can switch to entry_points, that makes it possible also to 
optimize the - directly-from-sources case. This is then much easier because we 
can rely on packages being in the right folder and we can use simple directory 
walking to find all provider files, which makes it considerably faster. Paired 
with the runtime "entrypoint' discovery I think we can have a very optimized 
solution keeping all the nice properties we have now: keeping provider 
"meta-data" in one place, with validated json.schema yaml file in "airflow" 
providers. But if someone wants, they can still keep them as dict objects or 
json in 3rd-party providers as long as they follow the schema. I don't think we 
need to force anyone to yaml for that even if it is super-convenient to us and 
we use it also for document generation.
   
   
   Unless you have other, better idea to keep the developer experience intact.
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to