potiuk commented on PR #58084: URL: https://github.com/apache/airflow/pull/58084#issuecomment-3508316233
It's run as part of the release process so release manager (in this case me stepping it for Elad) has to do after providers are released: https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDERS.md#update-providers-metadata A few word on why have this file at all: The main reson this is done is that it aids SBOM generation and to speed up the generation without pulling all the information for PyPI when we generate many of those in parallell processes.: https://github.com/apache/airflow/blob/bc3a750af476b991ca34e6c696208f8e75ff99a7/dev/breeze/doc/09_release_management_tasks.rst#sbom-generation-tasks The SBOMS are currently generated as part of the "release docs building process for airflow" https://github.com/apache/airflow/blob/main/dev/README_RELEASE_AIRFLOW.md#publish-final-documentation -> this "workflow_run" takes the "main" version of the "provider's metadata" and uses it to determine which version of providers we should be "matching" with the released airflow version. It's done in this way so that we can also at any time regnerate SBOMS for historical versions of Airflow - this file (pulled from main) is used to find which versions of providers were used in the version of Airflow. But the real usage of it when we want to regenerate the SBOMS for example if a new version of the tool (cdxgen) - is released. Then we can massively parallelise it, and it's been easier to just have the "provider's metadata" in the image already rather than trying to download all the constraints and interact with PyPI to find it out over and over. So this is largely a glorified cache that we update after release. There is also some ambiguity here we try to solve - sometime we release providers several times after a version of airflow so the mapping is not 1-1 it's usuall many-1 (many providers matching the same version) - and it also means that sometimes provider is released but it has "no" version of airflow that actually used it (yet) and this code is doing it by matching the released provider with the latest released airflow, even if it is not found in constraints yet. So the "cache" is built with a little more complex logic - reflecting the temporary nature of the "tip" of provider versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
