potiuk edited a comment on issue #14989:
URL: https://github.com/apache/airflow/issues/14989#issuecomment-808194362


   Looks great! @uranusjr !
   
   I have another thought in the meantime. Why not using caching of GitHub 
Actions: 
https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows
 - the inventory files are not big (2.3 M):
   
   ```
   [jarek:~/code/airflow/docs] master+ ± du -h --summarize _inventory_cache
   2.3M _inventory_cache
   ```
   
   So we could simply cache it and use already existing inventory cache 
mechanism this way (we are already using the _inventory_cache as fallback)?  
That will probably be way simpler than setting up S3 and managing it 
periodically?
   
   Also we could improve it to automatically add  the "exact" version of each 
library we use in each inventory URL, that would make it much better - 
currently we are taking "latest" or "stable" but in fact this is wrong - we 
should look at our constraints file and use the exact version of the library we 
use - this should be rather simple to retrieve from constraints file. That 
would also make caching much more efficient - we could invalidate cache 
whenever any of the libraries change and get it rebuilt. And this works really 
nicely if we use `restore-keys:` part of caching already present in 
GithubAction.
   
   Some pointers: 
   
   * we could take current library versions from : 
https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.8.txt
   * we could simply download the constraint file and it could be used to 
calculate cache validity:
             with this key: cache-inventory-${{ hashFiles('constrant-3.8.txt') 
}}
             and this restore-keys: cache-inventory-
             the restore-keys work in the way that if the hash of such 
constraint file changes, it will download the "latest available" cache matching 
the prefix and use it as a base (but then after it is rebuilt, it will upload a 
new version of cache with the changed hash for next job to use it.
   * and we could also use the constraints derive the right URLs when building 
inventory URLs to download the inventory from
       
   WDYT? Happy to help and brainstorm on it, but this would be much simpler 
operationally (no separate S3 folder, no access needed etc. ) and more 
"correct" in terms of the documentation generated.
       
       
      


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to