potiuk commented on PR #36537: URL: https://github.com/apache/airflow/pull/36537#issuecomment-1884781241
cc: @jscheffl and others - @ephraimbuddy but also @uranusjr - we've been discussing about security of the build scripts/plugins/extensions. Also related to the https://medium.com/apache-airflow/unraveling-the-code-navigating-a-ci-release-security-vulnerability-in-apache-airflow-620214a96297 article: In the last push: I implemented one other change as well that is security-focused. While - with hatch you can build airflow package locally with `hatch build -t custom -t wheel -t sdist`, the way how we build it in CI (in pull_request_target) should be isolated from the runner it runs on, becuase the runner potentially has access to secrets and tokens that could have write access on Github (for example to Github Registry as described in my article). But with the change I just pushed, the way how we build airflow packages for "production" now is far more secure now. The `prepare-airflow-packages` command that is run by the release manager, but also run in CI in our `pull request target` workflow uses now custom docker image and DOES NOT use volume mounting (which would allow potential attacker to write back some files). It is now done in three steps. 1) I generate dockerfile and dockerignore file in Airlfow (using breeze coming from main) . This Dockerfile builds a new image based on python debian official image and installs few needed dependencies from `pip` + I copy all the airflow sources coming from the PR (including .git repo to know what commit hash it comes from so that it can be added to Airflow). 2) The image is used to build airflow (it uses `hatch -t custom -t wheel -t sdist` inside the image - that includes automatically installing yarn and node modules from the scratch (using pre-commit environment management). This results in airflow packages generated in /opt/airflow/dist. 3) The host code pulls generated packages from the container (and delets the container afterwards). This way - even if someone changes our pyproject.toml, hatch_build.py and any other source code in their PR, such code will be at most executed inside the container image, that has no volumes mounted from the host - and then host pulls prepared package files without executing the code. This means that the potential attacker (using pull request code) would have to find a way to escape container first in order to be able to modify any code running in the GitHub runner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
