potiuk commented on PR #36537:
URL: https://github.com/apache/airflow/pull/36537#issuecomment-1884781241

   cc: @jscheffl and others - @ephraimbuddy but also @uranusjr  - we've been 
discussing about security of the build scripts/plugins/extensions. Also related 
to the 
https://medium.com/apache-airflow/unraveling-the-code-navigating-a-ci-release-security-vulnerability-in-apache-airflow-620214a96297
 article:
   
   In the last push: I implemented one other change as well that is 
security-focused.
   
   While - with hatch you can build airflow package locally with `hatch build 
-t custom -t wheel -t sdist`, the way how we build it in CI (in 
pull_request_target) should be isolated from the runner it runs on, becuase the 
runner potentially has access to secrets and tokens that could have write 
access on Github (for example to Github Registry as described in my article). 
   
   But with the change I just pushed, the way how we build airflow packages for 
"production" now is far more secure now. The `prepare-airflow-packages` command 
that is run by the release manager, but also run in CI in our `pull request 
target` workflow  uses now custom docker image and DOES NOT use volume mounting 
(which would allow potential attacker to write back some files).
   
   It is now done in three steps. 
   
   1) I generate dockerfile and dockerignore file  in Airlfow (using breeze 
coming from main) . This Dockerfile builds a new image based on python debian 
official image and installs few needed dependencies from `pip` + I copy all the 
airflow sources coming from the PR (including .git repo to know what commit 
hash it comes from so that it can be added to Airflow).
   
   2) The image is used to build airflow (it uses `hatch -t custom -t wheel -t 
sdist` inside the image - that includes automatically installing yarn and node 
modules from the scratch (using pre-commit environment management). This 
results in airflow packages generated in /opt/airflow/dist.
   
   3) The host code pulls generated packages from the container (and delets the 
container afterwards).
   
   This way - even if someone changes our pyproject.toml, hatch_build.py and 
any other source code in their PR, such code will be at most executed inside 
the container image, that has no volumes mounted from the host - and then host 
pulls prepared package files without executing the code. This means that the 
potential attacker (using pull request code) would have to find a way to escape 
container first in order to be able to modify any code running in the GitHub 
runner.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to