Re: Running Beam python pipeline on Spark

Thomas Weise Wed, 03 Jun 2020 19:14:04 -0700

If all Python dependencies are pre-installed on the yarn container hosts,
then you can use the process environment to spawn processes, like so:


https://github.com/lyft/flinkk8soperator/blob/bb8834d69e8621d636ef2085fdc167a9d2c2bfa3/examples/beam-python/src/beam_example/pipeline.py#L16-L17

Thomas


On Wed, Jun 3, 2020 at 5:48 PM Xinyu Liu <[email protected]> wrote:

> Hi, folks,
>
> I am trying to do some experiment to run a simple "hello world" python
> pipeline on a remote Spark cluster on Hadoop. So far I ran the
> SparkJobServerDriver on the Yarn application master and managed to submit a
> python pipeline to it. SparkPipelineRunner was able to run the portable
> pipeline and spawn some containers to run it. On the container itself, I
> don't see the sdk_worker.py getting executed so for ExecutableStage the
> code throws grpc io exceptions. I am wondering whether there is a way for
> spark runner to run python worker in the containers of yarn cluster? I
> don't see any existing code for it, and seems the ports allocated for
> bundle factory are also arbitrary. Any thoughts?
>
> Thanks,
> Xinyu
>

Re: Running Beam python pipeline on Spark

Reply via email to