Re: Command for Beam worker on Spark cluster

Matthew K. Wed, 06 Nov 2019 15:33:23 -0800

Thanks, still I need to pass parameters to the boot executable, such as, worker id, control endpoint, logging endpoint, etc.

Where can I extract these parameters from? (In apache_beam Python code, those can be extracted from StartWorker request parameters)

Also, how spark executor can find the port that grpc server is running on?

Sent: Wednesday, November 06, 2019 at 5:07 PM
From: "Kyle Weaver" <kcwea...@google.com>
To: dev <dev@beam.apache.org>
Subject: Re: Command for Beam worker on Spark cluster

In Docker mode, most everything's taken care of for you, but in process mode you have to do a lot of setup yourself. The command you're looking for is `sdks/python/container/build/target/launcher/linux_amd64/boot`. You will be required to have both that executable (which you can build from source using `./gradlew :sdks:python:container:build`) and a Python installation including Beam and other dependencies on all of your worker machines.

The best example I know of is here: https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165

On Wed, Nov 6, 2019 at 2:24 PM Matthew K. <softm...@gmx.com> wrote:

Hi all,

I am trying to run *Python* beam pipeline on a Spark cluster. Since workers are running on separate nodes, I am using "PROCESS" for "evironment_type" in pipeline options, but I couldn't find any documentation on what "command" I should pass to "environment_config" to run on the worker, so executor can be able to communicate with.

Can someone help me on that?

Re: Command for Beam worker on Spark cluster

Reply via email to