potiuk edited a comment on issue #14989: URL: https://github.com/apache/airflow/issues/14989#issuecomment-808194362
Looks great! @uranusjr ! I have another thought in the meantime. Why not using caching of GitHub Actions: https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows - the inventory files are not big (2.3 M): ``` [jarek:~/code/airflow/docs] master+ ± du -h --summarize _inventory_cache 2.3M _inventory_cache ``` So we could simply cache it and use already existing inventory cache mechanism this way (we are already using the _inventory_cache as fallback)? That will probably be way simpler than setting up S3 and managing it periodically? Also we could improve it to automatically add the "exact" version of each library we use in each inventory URL, that would make it much better - currently we are taking "latest" or "stable" but in fact this is wrong - we should look at our constraints file and use the exact version of the library we use - this should be rather simple to retrieve from constraints file. That would also make caching much more efficient - we could invalidate cache whenever any of the libraries change and get it rebuilt. And this works really nicely if we use `restore-keys:` part of caching already present in GithubAction. Some pointers: * we could take current library versions from : https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.8.txt * we could simply download the constraint file and it could be used to calculate cache validity: with this key: cache-inventory-${{ hashFiles('constrant-3.8.txt') }} and this restore-keys: cache-inventory- the restore-keys work in the way that if the hash of such constraint file changes, it will download the "latest available" cache matching the prefix and use it as a base (but then after it is rebuilt, it will upload a new version of cache with the changed hash for next job to use it. * and we could also use the constraints derive the right URLs when building inventory URLs to download the inventory from WDYT? Happy to help and brainstorm on it, but this would be much simpler operationally (no separate S3 folder, no access needed etc. ) and more "correct" in terms of the documentation generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
