potiuk commented on PR #58084:
URL: https://github.com/apache/airflow/pull/58084#issuecomment-3508316233

   It's run as part of the release process  so release manager (in this case me 
stepping it for Elad) has to do after providers are released:
   
   
https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDERS.md#update-providers-metadata
   
   A few word on why have this file at all:
   
   The main reson this is done is that it aids SBOM generation and to speed up 
the generation without pulling all the information for PyPI when we generate 
many of those in parallell processes.:
   
   
https://github.com/apache/airflow/blob/bc3a750af476b991ca34e6c696208f8e75ff99a7/dev/breeze/doc/09_release_management_tasks.rst#sbom-generation-tasks
   
   The SBOMS are currently generated as part of the "release docs building 
process for airflow" 
https://github.com/apache/airflow/blob/main/dev/README_RELEASE_AIRFLOW.md#publish-final-documentation
 -> this "workflow_run" takes the "main" version of the "provider's metadata"  
and uses it to determine which version of providers we should be "matching" 
with the released airflow version. It's done in this way so that we can also at 
any time regnerate SBOMS for historical versions of Airflow - this file (pulled 
from main) is used to find which versions of providers were used in the version 
of Airflow.
   
   
   But the real usage of it when we want to regenerate the SBOMS for example if 
a new version of the tool (cdxgen) - is released. Then we can massively 
parallelise it, and it's been easier to just have the "provider's metadata" in 
the image already rather than trying to download all the constraints and 
interact with PyPI to find it out over and over. 
   
   So this is largely a glorified cache that we update after release.
   
   There is also some ambiguity here we try to solve - sometime we release 
providers several times after a version of airflow so the mapping is not 1-1 
it's usuall many-1 (many providers matching the same version) - and it also 
means that sometimes provider is released but it has "no" version of airflow 
that actually used it (yet) and this code is doing it by matching the released 
provider with the latest released airflow, even if it is not found in 
constraints yet. So the "cache" is built with a little more complex logic - 
reflecting the temporary nature of the "tip" of  provider versions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to