This is an automated email from the ASF dual-hosted git repository. tvalentyn pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push: new 43d27ed52af Add a guide to build custom Beam Python SDK image (#33048) 43d27ed52af is described below commit 43d27ed52aff45e47ee48e1e1aef5ba2abfb0a94 Author: Minbo Bae <49642083+baemi...@users.noreply.github.com> AuthorDate: Mon Nov 11 16:46:22 2024 -0800 Add a guide to build custom Beam Python SDK image (#33048) * Add a guide to build custom Beam Python SDK image * Address review comments --- .../en/documentation/runtime/environments.md | 4 +- .../documentation/sdks/python-sdk-image-build.md | 306 +++++++++++++++++++++ .../layouts/partials/section-menu/en/sdks.html | 1 + 3 files changed, 310 insertions(+), 1 deletion(-) diff --git a/website/www/site/content/en/documentation/runtime/environments.md b/website/www/site/content/en/documentation/runtime/environments.md index a048c21046b..48039d50a10 100644 --- a/website/www/site/content/en/documentation/runtime/environments.md +++ b/website/www/site/content/en/documentation/runtime/environments.md @@ -105,7 +105,9 @@ This method requires building image artifacts from Beam source. For additional i 2. Customize the `Dockerfile` for a given language, typically `sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). -3. Return to the root Beam directory and run the Gradle `docker` target for your image. +3. Return to the root Beam directory and run the Gradle `docker` target for your + image. For self-contained instructions on building a container image, + follow [this guide](/documentation/sdks/python-sdk-image-build). ``` cd $BEAM_WORKDIR diff --git a/website/www/site/content/en/documentation/sdks/python-sdk-image-build.md b/website/www/site/content/en/documentation/sdks/python-sdk-image-build.md new file mode 100644 index 00000000000..f456a686afe --- /dev/null +++ b/website/www/site/content/en/documentation/sdks/python-sdk-image-build.md @@ -0,0 +1,306 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# Building Beam Python SDK Image Guide + +There are two options to build Beam Python SDK image. If you only need to modify +[the Python SDK boot entrypoint binary](https://github.com/apache/beam/blob/master/sdks/python/container/boot.go), +read [Update Boot Entrypoint Application Only](#update-boot-entrypoint-application-only). +If you need to build a Beam Python SDK image fully, +read [Build Beam Python SDK Image Fully](#build-beam-python-sdk-image-fully). + + +## Update Boot Entrypoint Application Only. + +If you only need to make a change to [the Python SDK boot entrypoint binary](https://github.com/apache/beam/blob/master/sdks/python/container/boot.go). You +can rebuild the boot application only and include the updated boot application +in the preexisting image. +Read [the Python container Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) +for reference. + +```shell +# From beam repo root, make changes to boot.go. +your_editor sdks/python/container/boot.go + +# Rebuild the entrypoint +./gradlew :sdks:python:container:gobuild + +cd sdks/python/container/build/target/launcher/linux_amd64 + +# Create a simple Dockerfile to use custom boot entrypoint. +cat >Dockerfile <<EOF +FROM apache/beam_python3.10_sdk:2.60.0 +COPY boot /opt/apache/beam/boot +EOF + +# Build the image +docker build . --tag us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom-boot +docker push us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom-boot +``` + +You can build a docker image if your local environment has Java, Python, Golang +and Docker installation. Try +`./gradlew :sdks:python:container:py<PYTHON_VERSION>:docker`. For example, +`:sdks:python:container:py310:docker` builds `apache/beam_python3.10_sdk` +locally if successful. You can follow this guide building a custom image from +a VM if the build fails in your local environment. + +## Build Beam Python SDK Image Fully + +This section introduces a way to build everything from the scratch. + +### Prepare VM + +Prepare a VM with Debian 11. This guide was tested on Debian 11. + +#### Google Compute Engine + +An option to create a Debian 11 VM is using a GCE instance. + +```shell +gcloud compute instances create beam-builder \ + --zone=us-central1-a \ + --image-project=debian-cloud \ + --image-family=debian-11 \ + --machine-type=n1-standard-8 \ + --boot-disk-size=20GB \ + --scopes=cloud-platform +``` + +Login to the VM. All the following steps are executed inside the VM. + +```shell +gcloud compute ssh beam-builder --zone=us-central1-a --tunnel-through-iap +``` + +Update the apt package list. + +```shell +sudo apt-get update +``` + +> [!NOTE] +> * A high CPU machine is recommended to reduce the compile time. +> * The image build needs a large disk. The build will fail with "no space left + on device" with the default disk size 10GB. +> * The `cloud-platform` is recommended to avoid permission issues with Google + Cloud Artifact Registry. You can use the default scopes if you don't push + the image to Google Cloud Artifact Registry. +> * Use a zone in the region of your docker repository of Artifact Registry if + you push the image to Artifact Registry. + +### Prerequisite Packages + +#### Java + +You need Java to run Gradle tasks. + +```shell +sudo apt-get install -y openjdk-11-jdk +``` + +#### Golang + +Download and install. Reference: https://go.dev/doc/install. + +```shell +# Download and install +curl -OL https://go.dev/dl/go1.23.2.linux-amd64.tar.gz +sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.23.2.linux-amd64.tar.gz + +# Add go to PATH. +export PATH=:/usr/local/go/bin:$PATH +``` + +Confirm the Golang version + +```shell +go version +``` + +Expected output: + +```text +go version go1.23.2 linux/amd64 +``` + +> [!NOTE] +> Old Go version (e.g. 1.16) will fail at `:sdks:python:container:goBuild`. + +#### Python + +This guide uses Pyenv to manage multiple Python versions. +Reference: https://realpython.com/intro-to-pyenv/#build-dependencies + +```shell +# Install dependencies +sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \ +libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \ +libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev + +# Install Pyenv +curl https://pyenv.run | bash + +# Add pyenv to PATH. +export PATH="$HOME/.pyenv/bin:$PATH" +eval "$(pyenv init -)" +eval "$(pyenv virtualenv-init -)" +``` + +Install Python 3.9 and set the Python version. This will take several minutes. + +```shell +pyenv install 3.9 +pyenv global 3.9 +``` + +Confirm the python version. + +```shell +python --version +``` + +Expected output example: + +```text +Python 3.9.17 +``` + +> [!NOTE] +> You can use a different Python version for building with [ +`-PpythonVersion` option](https://github.com/apache/beam/blob/v2.60.0/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L2956-L2961) +> to Gradle task run. Otherwise, you should have `python3.9` in the build +> environment for Apache Beam 2.60.0 or later (python3.8 for older Apache Beam +> versions). If you use the wrong version, the Gradle task +`:sdks:python:setupVirtualenv` fails. + +#### Docker + +Install Docker +following [the reference](https://docs.docker.com/engine/install/debian/#install-using-the-repository). + +```shell +# Add GPG keys. +sudo apt-get update +sudo apt-get install ca-certificates curl +sudo install -m 0755 -d /etc/apt/keyrings +sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc +sudo chmod a+r /etc/apt/keyrings/docker.asc + +# Add the Apt repository. +echo \ + "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ + $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ + sudo tee /etc/apt/sources.list.d/docker.list > /dev/null +sudo apt-get update + +# Install docker packages. +sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin +``` + +You need to run `docker` command without the root privilege in Beam Python SDK +image build. You can do this +by [adding your account to the docker group](https://docs.docker.com/engine/install/linux-postinstall/). + +```shell +sudo usermod -aG docker $USER +newgrp docker +``` + +Confirm if you can run a container without the root privilege. + +```shell +docker run hello-world +``` + +#### Git + +Git is not necessary for building Python SDK image. Git is just used to download +the Apache Beam code in this guide. + +```shell +sudo apt-get install -y git +``` + +### Build Beam Python SDK Image + +Download Apache Beam +from [the Github repository](https://github.com/apache/beam). + +```shell +git clone https://github.com/apache/beam beam +cd beam +``` + +Make changes to the Apache Beam code. + +Run the Gradle task to start Docker image build. This will take several minutes. +You can run `:sdks:python:container:py<PYTHON_VERSION>:docker` to build an image +for different Python version. +See [the supported Python version list](https://github.com/apache/beam/tree/master/sdks/python/container). +For example, `py310` is for Python 3.10. + +```shell +./gradlew :sdks:python:container:py310:docker +``` + +If the build is successful, you can see the built image locally. + +```shell +docker images +``` + +Expected output: + +```text +REPOSITORY TAG IMAGE ID CREATED SIZE +apache/beam_python3.10_sdk 2.60.0 33db45f57f25 About a minute ago 2.79GB +``` + +> [!NOTE] +> If you run the build in your local environment and Gradle task +`:sdks:python:setupVirtualenv` fails by an incompatible python version, please +> try with `-PpythonVersion` with the Python version installed in your local +> environment (e.g. `-PpythonVersion=3.10`) + +### Push to Repository + +You may push the custom image to a image repository. The image can be used +for [Dataflow custom container](https://cloud.google.com/dataflow/docs/guides/run-custom-container#usage). + +#### Google Cloud Artifact Registry + +You can push the image to Artifact Registry. No additional authentication is +necessary if you use Google Compute Engine. + +```shell +docker tag apache/beam_python3.10_sdk:2.60.0 us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom +docker push us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom +``` + +If you push an image in an environment other than a VM in Google Cloud, you +should configure [docker authentication with +`gcloud`](https://cloud.google.com/artifact-registry/docs/docker/authentication#gcloud-helper) +before `docker push`. + +#### Docker Hub + +You can push your Docker hub repository +after [docker login](https://docs.docker.com/reference/cli/docker/login/). + +```shell +docker tag apache/beam_python3.10_sdk:2.60.0 <my-account>/beam_python3.10_sdk:2.60.0-custom +docker push <my-account>/beam_python3.10_sdk:2.60.0-custom +``` + diff --git a/website/www/site/layouts/partials/section-menu/en/sdks.html b/website/www/site/layouts/partials/section-menu/en/sdks.html index ea48eb6f40d..243bbd92a46 100644 --- a/website/www/site/layouts/partials/section-menu/en/sdks.html +++ b/website/www/site/layouts/partials/section-menu/en/sdks.html @@ -44,6 +44,7 @@ <li><a href="/documentation/sdks/python-pipeline-dependencies/">Managing pipeline dependencies</a></li> <li><a href="/documentation/sdks/python-multi-language-pipelines/">Python multi-language pipelines quickstart</a></li> <li><a href="/documentation/sdks/python-unrecoverable-errors/">Python Unrecoverable Errors</a></li> + <li><a href="/documentation/sdks/python-sdk-image-build/">Python SDK image build</a></li> </ul> </li>