If all Python dependencies are pre-installed on the yarn container hosts, then you can use the process environment to spawn processes, like so:
https://github.com/lyft/flinkk8soperator/blob/bb8834d69e8621d636ef2085fdc167a9d2c2bfa3/examples/beam-python/src/beam_example/pipeline.py#L16-L17 Thomas On Wed, Jun 3, 2020 at 5:48 PM Xinyu Liu <[email protected]> wrote: > Hi, folks, > > I am trying to do some experiment to run a simple "hello world" python > pipeline on a remote Spark cluster on Hadoop. So far I ran the > SparkJobServerDriver on the Yarn application master and managed to submit a > python pipeline to it. SparkPipelineRunner was able to run the portable > pipeline and spawn some containers to run it. On the container itself, I > don't see the sdk_worker.py getting executed so for ExecutableStage the > code throws grpc io exceptions. I am wondering whether there is a way for > spark runner to run python worker in the containers of yarn cluster? I > don't see any existing code for it, and seems the ports allocated for > bundle factory are also arbitrary. Any thoughts? > > Thanks, > Xinyu >
