potiuk commented on code in PR #59: URL: https://github.com/apache/airflow-ci-infra/pull/59#discussion_r1749199527
########## runner/Dockerfile: ########## @@ -0,0 +1,35 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +FROM ghcr.io/actions/actions-runner:latest + +USER root + +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + ca-certificates curl nodejs npm wget unzip vim git jq build-essential netcat \ Review Comment: Short answr: I think emulation as an option should remain. Long answer: Here (and this is already part of the "CI/CD" knowledge transfer :) ) - let me explain why I think it should. I prefer - if possible - to have manual backup for all the "automated" processes we have in CI. Generallt my "guiding principle" for any kind of CI work like that is to "NEVER 100% rely on particular CI doing the job exclusively". I simply hate to be put in a situation where "something else" fails and the only answer is "we have to wait for them to fix it". This is ok for temporary disruptions, but when it comes to processes like releasing that we do often under time pressure and when there are expectations of a user for the new release to come up quickly - I prefer to not have to rely on 3rd-parties as much as possible. We actualy saw that last few weeks where the CI workflow to release RC packages was broken - we could only release Airlfow without having to find a "proper" solution immediately - because we had manual process, where you could either use hardware if you happen to have two machines or emulation (which @kaxil used) - even if it means it will take hours instead of minutes. We had plan B, and plan C which involved not only somone (like me) who had the right hardware setup, but also someone who has just a local machine, good networking and can "fire-and-forget" a process that runs for an hour rather than 10 minutes, without any special environment configuration. BTW. In this case - even if I could help with having the hardware setup, my AMD linux worstation at home ACTUALLY BROKE last week and I got it back only on Friday :D.... So ALWAYS HAVE PLAN B (and C and D sometimes). .... That allows the CI team sleep better :) In our case we have manual processes for all things that CI jobs currently do automatically, and we do not have a single part of the process that exclusively relies on GitHub Actons CI doing the job: * first of all - all commands in CI we run are not "GitHub Actions" exclusively (except some environment setup) - most of the important actions are `breeze` commands that can be run manually, locally. This is the main reason why we have `breeze` actually - and it's nicely captured in this "Architecture Decision Record" https://github.com/apache/airflow/blob/main/dev/breeze/doc/adr/0002-implement-standalone-python-command.md This basically means that if you look at each step of every CI job and replicate them locally - you should be able to get the same result locally. So if - for whatever reason - our CI will stop working (say ASF will limit our procesing time and we have no money for "self-hosted" runner in AWS - we will be able to replicate (slower and more painfully) what is now happening in CI - manually. * secondly - for processes that are likely to fail for whatever reason, we describe manual processes in "step-by-step" guides, explaining a) why we might need to do it b) how to perform setup of the environment c) how to run it by "human" Those are the current processes described this way: * https://github.com/apache/airflow/blob/main/dev/MANUALLY_BUILDING_IMAGES.md * https://github.com/apache/airflow/blob/main/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md So - whatever we do, we have to keep the "manual path" working as a backup plan. Using hardware is a bit problematic because you have to have two machines (ARM and AMD) handy and connected - yes it can be done using cloud (of course) but - ideally, the fallback option we have is to use a local machine of one of the PMC member to do all the stuff above - so that we do not rely on GH actions or even AWS account to be available. That's why emulation is going to stay - I think - as a backup plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
