Re: [PR] Use reproducible builds for provider packages [airflow]

via GitHub Fri, 17 Nov 2023 02:28:50 -0800


potiuk commented on PR #35693:
URL: https://github.com/apache/airflow/pull/35693#issuecomment-1816114368


   > What’s the intention behind using different timestamps in different 
providers? (But some of them are the same? Not sure if I’m reading the changes 
correctly.)
   
   Glad you asked. It's a deiberate choice and I spent some time thiniking 
about implication of using one vs. many and I chose "many" for a good reason - 
happy to explain it.
   
   The intention is to keep the time in wheels "real" and reflecting some 
"actual" time that people could even refer to actual event. One of the 
solutions you could choose there - you could put a fixed time always (0 or 
2000-01-01 equivalent or another arbitrary date). Or you could use single time 
for all releases just update it from time to time and move forward to current 
date. None of them have actual meaning. But I figured that the way how we are 
releasing providers currently and how our process looks like, we can make the 
dates in wheel actually MEAN something so that they are not artiffical.
   
   We are releasing different providers at different times - sometimes we 
release only amazon and google and sometimes http, sometimes all of them - and 
the choice is based on several factors - are there any changes to this provider 
(we don't release when there aren't), are they documentation only (then we just 
update documentation but not relase provider) or maybe we release all of them 
even if there are no changes (this is when we release wave of providers where 
we update min-airflow version for example or when we have a change that affects 
all provider - for example when we addded auto-generated `__init__.py`).  But 
generally speaking - each provider is released independently on its own 
schedule.
   
   Now, our provider preparation consistes of two steps (this is shortly 
describing process that is in detail described here: 
https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md)
 
   
   1) Step 1 - > preparing provider documentation. This is where we make sure 
that all the changes for that provider are present in CHANGELOG, we decide 
whethere to release provider or not, we decide what is the version bump 
(patchlevel, feature, breaking) and we update documentation. This change 
results in a commit where we update provider.yaml + CHANGELOG + commits and 
this is the commit that is used to generate the packages. We only update 
provider.yaml files for those packages that are going to be released. The other 
provider.yaml files remain untouched. And really the time of doing that update 
to provider.yaml is the time when the provider.yaml gets effectively "frozen" 
for the upcoming relase.
   
   Effectively the timestamp which is stored in provider.yaml is the timestamp 
when release manager run "breeze release-management 
prepare-provider-documenation` for that particular provider.
   
   Note that this also can change individually for each provider. It's quite 
possible that there are few iterations of this 
`prepare-proivdere-documentation`. The whole process is designed in the way 
that even before you prepare RC you can run and merge such documentation 
changes before preparing the whole wave - until you actually preapare packages 
you can incrementatlly add new changes that people add in main and effectively 
add new changes to only some providers - this way only one or two provider.yaml 
files might still get updated while you are doing it. And even later - when you 
decide to remove one or two providers from an rc wave and move them to RC2 - 
then you continue updating documentation and provider.yaml only for those 
providers that you removed from the RC1 wave - the other documentation and 
provider.yaml file is untouched while you are doing it. 
   
   This effectively means that the time documentation (and provider.yaml) were 
updated by the release manager will be different for each provider - even in 
the same "wave" of providers that are being released. The wave is really there 
to streamline voting process, but in fact each provider has individual release 
cycle.
   
   2) Step 2 -> provider package generation - this is done some time later and 
also we want to - in the future to make sure that whoever generates provider 
package from the same tag will get binary identical package (PMC member 
verifying the release) - so that "timestamp" for each package has to be stored 
in the source code tagged with RC (later final) tag.
   
   And for that it seems most natural to use timestamp that were frozen during 
documentation preparation FOR THAT PROVIDER (which - again might be different 
in each wave). We release the code from the commit (and this is where rc* tags 
are added) of the merged commit where the documentation update happened  - so 
effectively the timestamp when the provider yaml  was updated by release 
manager for THIS PROVIDER (mind that it might be different for each provider 
even in the same release wave) becomes the timestamp that we are using to 
generate the package.
   
   It seems most natural, the timestamp actually means someting (timestamp when 
release manager prepared documentation for that provider)  and it seems 
reasonable and desired to keep it different per each provider even if they were 
released in the same wave, because their documentation might be prepared at 
different times. 
   
   Also it's a nice record of when the last time documentation was updated for 
that provider - we could of course get it from git history - but seeing it in 
the code is a bit more accurate - because it shows actual time og "generation" 
not the time of commit (which might be minutes or hours later).
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Use reproducible builds for provider packages [airflow]

Reply via email to