Thanks for the pointers, Thomas. Let me give it a shot tomorrow. Thanks, Xinyu
On Wed, Jun 3, 2020 at 7:13 PM Thomas Weise <[email protected]> wrote: > If all Python dependencies are pre-installed on the yarn container hosts, > then you can use the process environment to spawn processes, like so: > > > https://github.com/lyft/flinkk8soperator/blob/bb8834d69e8621d636ef2085fdc167a9d2c2bfa3/examples/beam-python/src/beam_example/pipeline.py#L16-L17 > > Thomas > > > On Wed, Jun 3, 2020 at 5:48 PM Xinyu Liu <[email protected]> wrote: > >> Hi, folks, >> >> I am trying to do some experiment to run a simple "hello world" python >> pipeline on a remote Spark cluster on Hadoop. So far I ran the >> SparkJobServerDriver on the Yarn application master and managed to submit a >> python pipeline to it. SparkPipelineRunner was able to run the portable >> pipeline and spawn some containers to run it. On the container itself, I >> don't see the sdk_worker.py getting executed so for ExecutableStage the >> code throws grpc io exceptions. I am wondering whether there is a way for >> spark runner to run python worker in the containers of yarn cluster? I >> don't see any existing code for it, and seems the ports allocated for >> bundle factory are also arbitrary. Any thoughts? >> >> Thanks, >> Xinyu >> >
