hussein-awala commented on PR #30672: URL: https://github.com/apache/airflow/pull/30672#issuecomment-1510525417
That's really a lot of information, I need a couple of days to read the resources you've shared and process them. But yeah, there is a lot of possible improvements we can try: - upgrading the docker buildx and compose versions in the runners image (last update was one year ago in the CI repo) - review the used EC2 instance types and fine tune the resources based on the metrics you shared, and since we're decoupling the tests into smaller jobs to parallelize them, we can use the spot instances for these small jobs (if it is not already the case) to save up to 90% of the cost - moving to K8S runners for better auto scaling, and to be cloud agnostic, in this case we can distribute the load between EKS and GKE (will satisfy Google's condition) using label selectors, and we will be ready to use any new sponsored K8S cluster from other cloud providers - using dedicated docker RUN cache (`RUN --mount type=cache`) together with GHA cache for pip cache (and maybe other packages) - use Graviton2 with buildx to build amd64 images on arm64 (yes instead of building arm image on amd), it can reduce the cost by 40%, which we can use to pay for more instances > So if you would like to join me - I am all ears, and happy to share everything about it I will be glad to contribute to improving the CI of the project, which can have a significant impact on the development process Let me check out the resources you've shared and look up for some of the recently shared best practices for GHA runners. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
