Re: Running Beam python pipeline on Spark

Xinyu Liu Wed, 03 Jun 2020 22:47:36 -0700

Thanks for the pointers, Thomas. Let me give it a shot tomorrow.

Thanks,
Xinyu


On Wed, Jun 3, 2020 at 7:13 PM Thomas Weise <[email protected]> wrote:

> If all Python dependencies are pre-installed on the yarn container hosts,
> then you can use the process environment to spawn processes, like so:
>
>
> https://github.com/lyft/flinkk8soperator/blob/bb8834d69e8621d636ef2085fdc167a9d2c2bfa3/examples/beam-python/src/beam_example/pipeline.py#L16-L17
>
> Thomas
>
>
> On Wed, Jun 3, 2020 at 5:48 PM Xinyu Liu <[email protected]> wrote:
>
>> Hi, folks,
>>
>> I am trying to do some experiment to run a simple "hello world" python
>> pipeline on a remote Spark cluster on Hadoop. So far I ran the
>> SparkJobServerDriver on the Yarn application master and managed to submit a
>> python pipeline to it. SparkPipelineRunner was able to run the portable
>> pipeline and spawn some containers to run it. On the container itself, I
>> don't see the sdk_worker.py getting executed so for ExecutableStage the
>> code throws grpc io exceptions. I am wondering whether there is a way for
>> spark runner to run python worker in the containers of yarn cluster? I
>> don't see any existing code for it, and seems the ports allocated for
>> bundle factory are also arbitrary. Any thoughts?
>>
>> Thanks,
>> Xinyu
>>
>

Re: Running Beam python pipeline on Spark

Reply via email to