potiuk commented on PR #35693: URL: https://github.com/apache/airflow/pull/35693#issuecomment-1816114368
> What’s the intention behind using different timestamps in different providers? (But some of them are the same? Not sure if I’m reading the changes correctly.) Glad you asked. It's a deiberate choice and I spent some time thiniking about implication of using one vs. many and I chose "many" for a good reason - happy to explain it. The intention is to keep the time in wheels "real" and reflecting some "actual" time that people could even refer to actual event. One of the solutions you could choose there - you could put a fixed time always (0 or 2000-01-01 equivalent or another arbitrary date). Or you could use single time for all releases just update it from time to time and move forward to current date. None of them have actual meaning. But I figured that the way how we are releasing providers currently and how our process looks like, we can make the dates in wheel actually MEAN something so that they are not artiffical. We are releasing different providers at different times - sometimes we release only amazon and google and sometimes http, sometimes all of them - and the choice is based on several factors - are there any changes to this provider (we don't release when there aren't), are they documentation only (then we just update documentation but not relase provider) or maybe we release all of them even if there are no changes (this is when we release wave of providers where we update min-airflow version for example or when we have a change that affects all provider - for example when we addded auto-generated `__init__.py`). But generally speaking - each provider is released independently on its own schedule. Now, our provider preparation consistes of two steps (this is shortly describing process that is in detail described here: https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md) 1) Step 1 - > preparing provider documentation. This is where we make sure that all the changes for that provider are present in CHANGELOG, we decide whethere to release provider or not, we decide what is the version bump (patchlevel, feature, breaking) and we update documentation. This change results in a commit where we update provider.yaml + CHANGELOG + commits and this is the commit that is used to generate the packages. We only update provider.yaml files for those packages that are going to be released. The other provider.yaml files remain untouched. And really the time of doing that update to provider.yaml is the time when the provider.yaml gets effectively "frozen" for the upcoming relase. Effectively the timestamp which is stored in provider.yaml is the timestamp when release manager run "breeze release-management prepare-provider-documenation` for that particular provider. Note that this also can change individually for each provider. It's quite possible that there are few iterations of this `prepare-proivdere-documentation`. The whole process is designed in the way that even before you prepare RC you can run and merge such documentation changes before preparing the whole wave - until you actually preapare packages you can incrementatlly add new changes that people add in main and effectively add new changes to only some providers - this way only one or two provider.yaml files might still get updated while you are doing it. And even later - when you decide to remove one or two providers from an rc wave and move them to RC2 - then you continue updating documentation and provider.yaml only for those providers that you removed from the RC1 wave - the other documentation and provider.yaml file is untouched while you are doing it. This effectively means that the time documentation (and provider.yaml) were updated by the release manager will be different for each provider - even in the same "wave" of providers that are being released. The wave is really there to streamline voting process, but in fact each provider has individual release cycle. 2) Step 2 -> provider package generation - this is done some time later and also we want to - in the future to make sure that whoever generates provider package from the same tag will get binary identical package (PMC member verifying the release) - so that "timestamp" for each package has to be stored in the source code tagged with RC (later final) tag. And for that it seems most natural to use timestamp that were frozen during documentation preparation FOR THAT PROVIDER (which - again might be different in each wave). We release the code from the commit (and this is where rc* tags are added) of the merged commit where the documentation update happened - so effectively the timestamp when the provider yaml was updated by release manager for THIS PROVIDER (mind that it might be different for each provider even in the same release wave) becomes the timestamp that we are using to generate the package. It seems most natural, the timestamp actually means someting (timestamp when release manager prepared documentation for that provider) and it seems reasonable and desired to keep it different per each provider even if they were released in the same wave, because their documentation might be prepared at different times. Also it's a nice record of when the last time documentation was updated for that provider - we could of course get it from git history - but seeing it in the code is a bit more accurate - because it shows actual time og "generation" not the time of commit (which might be minutes or hours later). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
