kaxil commented on code in PR #62261:
URL: https://github.com/apache/airflow/pull/62261#discussion_r2836972872


##########
dev/registry/extract_metadata.py:
##########


Review Comment:
   Good question, and I get the concern for sure.
   
   On Sphinx overlap, the registry and Sphinx docs serve different purposes. 
Sphinx gives us per-provider API reference as HTML. The registry needs 
structured JSON for a searchable cross-provider catalog:
   constructor parameters with types/defaults, PyPI download stats, connection 
form metadata (from get_connection_form_widgets() at runtime), 
module-to-category mappings, etc. None of that exists in Sphinx output. We
   do use Sphinx `objects.inv` files for docs URL resolution though.
   
   Embedding this into Sphinx extensions would tie the registry build to the 
full docs pipeline, which is heavier and slower than what we need.
   
   I am going to explore some prek hook integration next week and will look at 
this with fresh eyes. The other thing I need to add is backfill script, which 
current is not checked in. And separately another script (or GH Action or 
something) to scrape metadata for 3rd party community providers -- but that 
would be a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to