hussein-awala commented on PR #30672:
URL: https://github.com/apache/airflow/pull/30672#issuecomment-1510525417

   That's really a lot of information, I need a couple of days to read the 
resources you've shared and process them.
   
   But yeah, there is a lot of possible improvements we can try:
   - upgrading the docker buildx and compose versions in the runners image 
(last update was one year ago in the CI repo)
   - review the used EC2 instance types and fine tune the resources based on 
the metrics you shared, and since we're decoupling the tests into smaller jobs 
to parallelize them, we can use the spot instances for these small jobs (if it 
is not already the case) to save up to 90% of the cost
   - moving to K8S runners for better auto scaling, and to be cloud agnostic, 
in this case we can distribute the load between EKS and GKE (will satisfy 
Google's condition) using label selectors, and we will be ready to use any new 
sponsored K8S cluster from other cloud providers
   - using dedicated docker RUN cache (`RUN --mount type=cache`) together with 
GHA cache for pip cache (and maybe other packages)
   - use Graviton2 with buildx to build amd64 images on arm64 (yes instead of 
building arm image on amd), it can reduce the cost by 40%, which we can use to 
pay for more instances
   
   > So if you would like to join me - I am all ears, and happy to share 
everything about it
   
   I will be glad to contribute to improving the CI of the project, which can 
have a significant impact on the development process
   
   Let me check out the resources you've shared and look up for some of the 
recently shared best practices for GHA runners.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to