[ 
https://issues.apache.org/jira/browse/BEAM-8209?focusedWorklogId=317951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317951
 ]

ASF GitHub Bot logged work on BEAM-8209:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Sep/19 00:43
            Start Date: 25/Sep/19 00:43
    Worklog Time Spent: 10m 
      Work Description: Hannah-Jiang commented on pull request #9607: 
[BEAM-8209] Custom container docs
URL: https://github.com/apache/beam/pull/9607#discussion_r327889107
 
 

 ##########
 File path: website/src/documentation/runtime/environments.md
 ##########
 @@ -0,0 +1,187 @@
+---
+layout: section
+title: "Runtime environments"
+section_menu: section-menu/documentation.html
+permalink: /documentation/runtime/environments/
+redirect_from:
+  - /documentation/execution-model/
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Runtime environments
+
+Any execution engine can run the Beam SDK beacuse the SDK runtime environment 
is [containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/) and isolated from other runtime systems. This 
page describes how to build, customize, and push Beam SDK container images.
+
+## Building container images
+
+Before building Beam SDK container images:
+* Register a [Bintray](https://bintray.com/) account with a Docker repository 
named `apache`.
+* Install [Docker](https://www.docker.com/) on your workstation.
+
+To build Beam SDK container images:
+
+<ol>
+    <li>
+        Navigate to your local copy of the <a 
href="https://github.com/apache/beam";><code>beam</code></a>
+    </li>
+    <li>
+        Run Gradle with the <code>docker</code> target: <pre>./gradlew 
docker</pre>
+    </li>
+</ol>
+
+> **Note**: It may take a long time to build all of the container images. You 
can instead build the images for specific SDKs:
+>
+> ```
+> ./gradlew -p sdks/java/container docker
+> ./gradlew -p sdks/python/container docker
+> ./gradlew -p sdks/go/container docker
+> ```
+
+Run `docker images` to examine the containers. For example, if you 
successfully built the container images, the command prompt displays a response 
like:
+
+```
+REPOSITORY                                       TAG                    IMAGE 
ID            CREATED       SIZE
+$USER-docker-apache.bintray.io/beam/python     latest             4ea515403a1a 
     3 minutes ago     1.27GB
+$USER-docker-apache.bintray.io/beam/java       latest             0103512f1d8f 
    34 minutes ago      780MB
+$USER-docker-apache.bintray.io/beam/go         latest             ce055985808a 
    35 minutes ago      121MB
+```
+
+<b>Although the respository names look like URLs, the container images are 
stored locally on your workstation.</b> After building the container images 
locally, you can [push](#pushing-container-images) them to an eponymous 
repository online.
+
+### Overriding default Docker targets
+
+The default SDK version is `latest` and the default Docker repository is the 
following Bintray location:
+
+```
+$USER-docker-apache.bintray.io/beam
+```
+
+When you [build SDK container images](#building-container-images), you can 
override the default version and location.
+
+To specify an older Python SDK version, like 2.3.0, build the container with 
the `docker-tag` option:
+
+```
+./gradlew docker -Pdocker-tag=2.3.0
+```
+
+To change the `docker` target, build the container with the 
`docker-repository-root` option:
+
+```
+./gradlew docker -Pdocker-repository-root=$LOCATION
+```
+
+## Customizing container images
+
+You can add extra dependencies or serialization files to container images so 
the execution engine doesn't need them.
+
+To customize a container image, either:
+* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original
+* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container
+
+It's often easier to write a new Dockerfile, but you can customize anything, 
including the base OS, by modifying the original.
+
+### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+
+<ol>
+    <li>
+        Pull a <a 
href="https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release";>prebuilt
 SDK container image</a> for your target language and version.
+    </li>
+    <li>
+        <a 
href="https://docs.docker.com/develop/develop-images/dockerfile_best-practices/";>Write
 a new Dockerfile</a> that <a 
href="https://docs.docker.com/engine/reference/builder/#from";>designates</a> 
the original as its <a 
href="https://docs.docker.com/glossary/?term=parent%20image";>parent</a>
+    </li>
+    <li>
+        Build a child image: <pre>docker build -f /path/to/new/Dockerfile</pre>
+    </li>
+</ol>
+
+### Modifying the original Dockerfile {#modifying-dockerfiles}
+
+1. Pull the [prebuilt SDK container 
image](https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release)
 for your target language and version
+2. Customize the 
[Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
 If you're adding dependencies from [PyPI](https://pypi.org/), use 
`base_image_requirements.txt` instead of the Dockerfile.
+3. [Reimage](#building-container-images) the container
+
+### Testing customized images
+
+Test customized images with PortableRunner. PortableRunner targets the default 
Docker images, like 
`${USER}-docker-apache.bintray.io/beam/python${version}:latest`. You can 
specify a different container with the `--environment_config` flag.
+
+#### Testing on local machines
+
+```
+python -m apache_beam.examples.wordcount --input /path/to/inputfile --output 
/path/to/write/counts \
+--runner=PortableRunner \
+--job_endpoint=embed \
+[--environment_config={docker_image_name}]
+```
+
+#### Testing on local Flink clusters
+
+<ol>
+    <li> Start a Flink job server on localhost:8099:
+        <pre>./gradlew :runners:flink:1.5:job-server:runShadow</pre>
+    </li>
+    <li> Run a pipeline on the Flink job server:
+<pre>python -m apache_beam.examples.wordcount --input /path/to/inputfile 
--output /path/to/write/counts \
+--runner PortableRunner \
+--job_endpoint localhost:8099 \
+[--environment_config {docker_image_name}]</pre>
+    </li>
+</ol>
+
+#### Testing on local Spark clusters
+
+<ol>
+    <li> Start a Spark job server on localhost:8099:
+        <pre>./gradlew :runners:spark:job-server:runShadow</pre>
+    </li>
+    <li> Run a pipeline on the Spark job server:
+<pre>python -m apache_beam.examples.wordcount --input /path/to/inputfile 
--output /path/to/write/counts \
+--runner=PortableRunner \
+--job_endpoint=localhost:8099 \
+[--environment_config={docker_image_name}]</pre>
+    </li>
+</ol>
+
+#### Testing on Google Cloud Dataflow
+
+```
+python -m apache_beam.examples.wordcount --input --input /path/to/inputfile 
--output /path/to/write/counts \
+--runner=DataflowRunner \
+--project={gcp_project_id} \
+--temp_location={gcs_location} \
+--worker_harness_container_image={docker_image_name} \
+--experiment=beam_fn_api \
+--sdk_location \ 
[…]/beam/sdks/python/container/py{version}/build/target/apache-beam.tar.gz
+* version={2,35,36,37}
+```
+
+## Pushing container images
+
+After locally [building a container image](#building-container-images), push 
it to a remote Docker repository.
+
+Before pushing container images to your Docker registry, log in to your 
Bintray account with the `apache` Docker repository: ```docker login```
 
 Review comment:
   Pushing images to a remote repository is an option for users. Pushing to 
which repository is also user's choice, we can give Bintray as an example. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 317951)
    Time Spent: 3h 50m  (was: 3h 40m)

> Document custom docker containers
> ---------------------------------
>
>                 Key: BEAM-8209
>                 URL: https://issues.apache.org/jira/browse/BEAM-8209
>             Project: Beam
>          Issue Type: Sub-task
>          Components: website
>            Reporter: Cyrus Maden
>            Assignee: Cyrus Maden
>            Priority: Minor
>             Fix For: 2.16.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to