potiuk commented on PR #24664:
URL: https://github.com/apache/airflow/pull/24664#issuecomment-1168093297

   > @potiuk, just want to double check there is no security/resource 
considerations here with us spinning up ARM machines for normal, non-committer 
PRs, right?
   
   Good questions. I :heart: that you asked them :). Here are my explanations 
and way how it is protected, but I would love a challenge/review and confirming 
that my thinking is right. Security is a teamwork and I don't trust myself, so 
I love someone can verify my apprach.
   
   # Security
   
   The ARM steps are executed in 2 caset:
   
   a) build image workfow. Those are not "directly" affected by PR changes - 
the command will only run the "Dockerfile RUN commands"  inside the ARM docker 
engine - Dockerfile is the only user-provided code that is executed during that 
build - and it is only executed as part of "docker build" (which runs inside 
the docker builder). You'd have to escape the container to get into the ARM 
instance. This is basically the same (or lower) risk we have currently - the 
Dockerfile RUN commands are executed inside the docker-engine of the S3 AMD 
instance we run.  It's actually a bit lower because the ARM instances do not 
run the "runner" and the runner - it's a "bare" ARM instance with docker engine 
running with SSH-port-forwarded connection to it. 
   
   b) if you run PR from a branch in "apache" repo (like I did in this PR) - 
then the ARM instance will run in the CI worklfow. This is controlled by `if: 
needs.build-info.outputs.inWorkflowBuild == 'true'`. You need to have committer 
status to be able to push branch to Airflow Repo and make PR from it. 
Non-commiters can only run this CI workflow from fork, and even if they can 
change the ci.yaml there and change the `if: 
needs.build-info.outputs.inWorkflowBuild == 'true'` it will not work because 
fork-run ci.yaml workflow does not run on our machines and only our build 
machines have capability to spin the ARM instances. The"build_image" workflow 
does not run for those PR-s because they are skipped if pull request head is 
"apache/airlfow" (similarly as all the other "build" steps)  
https://github.com/apache/airflow/pull/24664/files#diff-a6ce9b5b0a53d8dc6dbd59d40b23994c3e68a91fac73c0d7eacdf373c95476c9R386.
 
   
   The reason for having those duplicate "build" steps in CI workflow (in case 
of PRs from apache/airflow repo) is that I can actually test those ARM builds. 
When they are run from "CI" workflow, they run with "my code", when they are 
run in "build" workflow - they are run with "main" code and I only can test 
them when I merge them (which is main reason some of the workflows need a few 
merges to finally succed - as I have  literally no way to test them before 
merge. For example the "build" workflow Start ARM instance/Stop ARM instance 
had never been tested, because the only way I can test it is by merging the 
code to main.
   
   # Cost
   
   Those ARM machines have auto-cancelling - no matter if succeded or failed: 
50 minues 
[self-kill-switch](https://github.com/apache/airflow/blob/main/scripts/ci/images/self_terminate.sh)
 and "cancel" is run always: 
https://github.com/apache/airflow/pull/24664/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fR417
 - so even if someone presses cancel too many times or in case build "hangs" we 
got maximum ~ 1hr of instance run. The instances have  [maximum bid spot price 
set]( 
https://github.com/apache/airflow/blob/48ceda22bdbee50b2d6ca24767164ce485f3c319/scripts/ci/images/ci_start_arm_instance_and_connect_to_docker.sh#L27)
 for requesting the instance. It is set to 0.1 USD /HR. Curent spot price in 
Ohio: `c6g.xlarge | $0.0399 per Hour`  - so currently 1 h of running the 
instance is 0.04 USD - max 0.1 USD. Building the 4x images (when cache is 
refreshed - which is vast majority of times) is ~ 12 minutes. This means that 
running a single ARM instance step is 0.04
  / 5 ~ 0.008 USD (max 0,02 USD). We have ~ 100 image builds / day on average 
working day = 0.8 to 2 USD / day = 30 USD Month (with rather generous 
over-counting the days). this is ~ 30/1300 ~ 2% of current spending. 
   
   I personally think it's worth it as it saves a lot of our contributors from 
headaches on M1s - and this is not academic - this has already happened 
https://github.com/apache/airflow/pull/24635 and it was couple of hours lost 
for a few people. We have currently no protection mechanisms to prevent it from 
happening again - and taking into account that a number of deps (including 
mysql and mssql) do not yet have debian ARM package and fail when building, it 
is likely to happen again
   
   BTW. Probably we loose way more but not merging some optimisations which are 
currently in the pipeline (for example this one 
https://github.com/apache/airflow/pull/24666 - I have not assesed that one 
because it is far more difficult but I would guess it allows to save ~ 3 
minutes of build time x2 for AMD instances at 0.08 USD /hr per version in ~ 80% 
(wild guess) of our PRs  which  is already  more than that cost of those ARM 
instances. 
   
   And I already saved quite a lot and more : 
https://github.com/apache/airflow/pull/24580 (and more to come with more breeze 
conversion) so I am on a cost-and-build-saving spree here. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to