This is an automated email from the ASF dual-hosted git repository.
tvalentyn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 43d27ed52af Add a guide to build custom Beam Python SDK image (#33048)
43d27ed52af is described below
commit 43d27ed52aff45e47ee48e1e1aef5ba2abfb0a94
Author: Minbo Bae <[email protected]>
AuthorDate: Mon Nov 11 16:46:22 2024 -0800
Add a guide to build custom Beam Python SDK image (#33048)
* Add a guide to build custom Beam Python SDK image
* Address review comments
---
.../en/documentation/runtime/environments.md | 4 +-
.../documentation/sdks/python-sdk-image-build.md | 306 +++++++++++++++++++++
.../layouts/partials/section-menu/en/sdks.html | 1 +
3 files changed, 310 insertions(+), 1 deletion(-)
diff --git a/website/www/site/content/en/documentation/runtime/environments.md
b/website/www/site/content/en/documentation/runtime/environments.md
index a048c21046b..48039d50a10 100644
--- a/website/www/site/content/en/documentation/runtime/environments.md
+++ b/website/www/site/content/en/documentation/runtime/environments.md
@@ -105,7 +105,9 @@ This method requires building image artifacts from Beam
source. For additional i
2. Customize the `Dockerfile` for a given language, typically
`sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for
Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
-3. Return to the root Beam directory and run the Gradle `docker` target for
your image.
+3. Return to the root Beam directory and run the Gradle `docker` target for
your
+ image. For self-contained instructions on building a container image,
+ follow [this guide](/documentation/sdks/python-sdk-image-build).
```
cd $BEAM_WORKDIR
diff --git
a/website/www/site/content/en/documentation/sdks/python-sdk-image-build.md
b/website/www/site/content/en/documentation/sdks/python-sdk-image-build.md
new file mode 100644
index 00000000000..f456a686afe
--- /dev/null
+++ b/website/www/site/content/en/documentation/sdks/python-sdk-image-build.md
@@ -0,0 +1,306 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Building Beam Python SDK Image Guide
+
+There are two options to build Beam Python SDK image. If you only need to
modify
+[the Python SDK boot entrypoint
binary](https://github.com/apache/beam/blob/master/sdks/python/container/boot.go),
+read [Update Boot Entrypoint Application
Only](#update-boot-entrypoint-application-only).
+If you need to build a Beam Python SDK image fully,
+read [Build Beam Python SDK Image Fully](#build-beam-python-sdk-image-fully).
+
+
+## Update Boot Entrypoint Application Only.
+
+If you only need to make a change to [the Python SDK boot entrypoint
binary](https://github.com/apache/beam/blob/master/sdks/python/container/boot.go).
You
+can rebuild the boot application only and include the updated boot application
+in the preexisting image.
+Read [the Python container
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
+for reference.
+
+```shell
+# From beam repo root, make changes to boot.go.
+your_editor sdks/python/container/boot.go
+
+# Rebuild the entrypoint
+./gradlew :sdks:python:container:gobuild
+
+cd sdks/python/container/build/target/launcher/linux_amd64
+
+# Create a simple Dockerfile to use custom boot entrypoint.
+cat >Dockerfile <<EOF
+FROM apache/beam_python3.10_sdk:2.60.0
+COPY boot /opt/apache/beam/boot
+EOF
+
+# Build the image
+docker build . --tag
us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom-boot
+docker push
us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom-boot
+```
+
+You can build a docker image if your local environment has Java, Python, Golang
+and Docker installation. Try
+`./gradlew :sdks:python:container:py<PYTHON_VERSION>:docker`. For example,
+`:sdks:python:container:py310:docker` builds `apache/beam_python3.10_sdk`
+locally if successful. You can follow this guide building a custom image from
+a VM if the build fails in your local environment.
+
+## Build Beam Python SDK Image Fully
+
+This section introduces a way to build everything from the scratch.
+
+### Prepare VM
+
+Prepare a VM with Debian 11. This guide was tested on Debian 11.
+
+#### Google Compute Engine
+
+An option to create a Debian 11 VM is using a GCE instance.
+
+```shell
+gcloud compute instances create beam-builder \
+ --zone=us-central1-a \
+ --image-project=debian-cloud \
+ --image-family=debian-11 \
+ --machine-type=n1-standard-8 \
+ --boot-disk-size=20GB \
+ --scopes=cloud-platform
+```
+
+Login to the VM. All the following steps are executed inside the VM.
+
+```shell
+gcloud compute ssh beam-builder --zone=us-central1-a --tunnel-through-iap
+```
+
+Update the apt package list.
+
+```shell
+sudo apt-get update
+```
+
+> [!NOTE]
+> * A high CPU machine is recommended to reduce the compile time.
+> * The image build needs a large disk. The build will fail with "no space left
+ on device" with the default disk size 10GB.
+> * The `cloud-platform` is recommended to avoid permission issues with Google
+ Cloud Artifact Registry. You can use the default scopes if you don't push
+ the image to Google Cloud Artifact Registry.
+> * Use a zone in the region of your docker repository of Artifact Registry if
+ you push the image to Artifact Registry.
+
+### Prerequisite Packages
+
+#### Java
+
+You need Java to run Gradle tasks.
+
+```shell
+sudo apt-get install -y openjdk-11-jdk
+```
+
+#### Golang
+
+Download and install. Reference: https://go.dev/doc/install.
+
+```shell
+# Download and install
+curl -OL https://go.dev/dl/go1.23.2.linux-amd64.tar.gz
+sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf
go1.23.2.linux-amd64.tar.gz
+
+# Add go to PATH.
+export PATH=:/usr/local/go/bin:$PATH
+```
+
+Confirm the Golang version
+
+```shell
+go version
+```
+
+Expected output:
+
+```text
+go version go1.23.2 linux/amd64
+```
+
+> [!NOTE]
+> Old Go version (e.g. 1.16) will fail at `:sdks:python:container:goBuild`.
+
+#### Python
+
+This guide uses Pyenv to manage multiple Python versions.
+Reference: https://realpython.com/intro-to-pyenv/#build-dependencies
+
+```shell
+# Install dependencies
+sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
+libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
+libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev
+
+# Install Pyenv
+curl https://pyenv.run | bash
+
+# Add pyenv to PATH.
+export PATH="$HOME/.pyenv/bin:$PATH"
+eval "$(pyenv init -)"
+eval "$(pyenv virtualenv-init -)"
+```
+
+Install Python 3.9 and set the Python version. This will take several minutes.
+
+```shell
+pyenv install 3.9
+pyenv global 3.9
+```
+
+Confirm the python version.
+
+```shell
+python --version
+```
+
+Expected output example:
+
+```text
+Python 3.9.17
+```
+
+> [!NOTE]
+> You can use a different Python version for building with [
+`-PpythonVersion`
option](https://github.com/apache/beam/blob/v2.60.0/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L2956-L2961)
+> to Gradle task run. Otherwise, you should have `python3.9` in the build
+> environment for Apache Beam 2.60.0 or later (python3.8 for older Apache Beam
+> versions). If you use the wrong version, the Gradle task
+`:sdks:python:setupVirtualenv` fails.
+
+#### Docker
+
+Install Docker
+following [the
reference](https://docs.docker.com/engine/install/debian/#install-using-the-repository).
+
+```shell
+# Add GPG keys.
+sudo apt-get update
+sudo apt-get install ca-certificates curl
+sudo install -m 0755 -d /etc/apt/keyrings
+sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o
/etc/apt/keyrings/docker.asc
+sudo chmod a+r /etc/apt/keyrings/docker.asc
+
+# Add the Apt repository.
+echo \
+ "deb [arch=$(dpkg --print-architecture)
signed-by=/etc/apt/keyrings/docker.asc]
https://download.docker.com/linux/debian \
+ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+sudo apt-get update
+
+# Install docker packages.
+sudo apt-get install -y docker-ce docker-ce-cli containerd.io
docker-buildx-plugin docker-compose-plugin
+```
+
+You need to run `docker` command without the root privilege in Beam Python SDK
+image build. You can do this
+by [adding your account to the docker
group](https://docs.docker.com/engine/install/linux-postinstall/).
+
+```shell
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
+Confirm if you can run a container without the root privilege.
+
+```shell
+docker run hello-world
+```
+
+#### Git
+
+Git is not necessary for building Python SDK image. Git is just used to
download
+the Apache Beam code in this guide.
+
+```shell
+sudo apt-get install -y git
+```
+
+### Build Beam Python SDK Image
+
+Download Apache Beam
+from [the Github repository](https://github.com/apache/beam).
+
+```shell
+git clone https://github.com/apache/beam beam
+cd beam
+```
+
+Make changes to the Apache Beam code.
+
+Run the Gradle task to start Docker image build. This will take several
minutes.
+You can run `:sdks:python:container:py<PYTHON_VERSION>:docker` to build an
image
+for different Python version.
+See [the supported Python version
list](https://github.com/apache/beam/tree/master/sdks/python/container).
+For example, `py310` is for Python 3.10.
+
+```shell
+./gradlew :sdks:python:container:py310:docker
+```
+
+If the build is successful, you can see the built image locally.
+
+```shell
+docker images
+```
+
+Expected output:
+
+```text
+REPOSITORY TAG IMAGE ID CREATED SIZE
+apache/beam_python3.10_sdk 2.60.0 33db45f57f25 About a minute ago
2.79GB
+```
+
+> [!NOTE]
+> If you run the build in your local environment and Gradle task
+`:sdks:python:setupVirtualenv` fails by an incompatible python version, please
+> try with `-PpythonVersion` with the Python version installed in your local
+> environment (e.g. `-PpythonVersion=3.10`)
+
+### Push to Repository
+
+You may push the custom image to a image repository. The image can be used
+for [Dataflow custom
container](https://cloud.google.com/dataflow/docs/guides/run-custom-container#usage).
+
+#### Google Cloud Artifact Registry
+
+You can push the image to Artifact Registry. No additional authentication is
+necessary if you use Google Compute Engine.
+
+```shell
+docker tag apache/beam_python3.10_sdk:2.60.0
us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom
+docker push
us-central1-docker.pkg.dev/<MY_PROJECT>/<MY_REPOSITORY>/beam_python3.10_sdk:2.60.0-custom
+```
+
+If you push an image in an environment other than a VM in Google Cloud, you
+should configure [docker authentication with
+`gcloud`](https://cloud.google.com/artifact-registry/docs/docker/authentication#gcloud-helper)
+before `docker push`.
+
+#### Docker Hub
+
+You can push your Docker hub repository
+after [docker login](https://docs.docker.com/reference/cli/docker/login/).
+
+```shell
+docker tag apache/beam_python3.10_sdk:2.60.0
<my-account>/beam_python3.10_sdk:2.60.0-custom
+docker push <my-account>/beam_python3.10_sdk:2.60.0-custom
+```
+
diff --git a/website/www/site/layouts/partials/section-menu/en/sdks.html
b/website/www/site/layouts/partials/section-menu/en/sdks.html
index ea48eb6f40d..243bbd92a46 100644
--- a/website/www/site/layouts/partials/section-menu/en/sdks.html
+++ b/website/www/site/layouts/partials/section-menu/en/sdks.html
@@ -44,6 +44,7 @@
<li><a href="/documentation/sdks/python-pipeline-dependencies/">Managing
pipeline dependencies</a></li>
<li><a href="/documentation/sdks/python-multi-language-pipelines/">Python
multi-language pipelines quickstart</a></li>
<li><a href="/documentation/sdks/python-unrecoverable-errors/">Python
Unrecoverable Errors</a></li>
+ <li><a href="/documentation/sdks/python-sdk-image-build/">Python SDK image
build</a></li>
</ul>
</li>