Thanks for the information. So it looks like we can't easily run portable pipelines on Dataproc cluster at the moment.
> you can set --output_executable_path to create a jar that you can then submit to yarn via spark-submit. I tried to create a jar, but I ran into a problem. I left an error message in a comment for https://issues.apache.org/jira/browse/BEAM-8970. On Wed, Jun 24, 2020 at 1:25 AM Kyle Weaver <kcwea...@google.com> wrote: > > So hopefully setting --spark-master-url to be yarn will work too. > > This is not supported. > > On Tue, Jun 23, 2020 at 2:58 PM Xinyu Liu <xinyuliu...@gmail.com> wrote: > >> I am doing some prototyping on this too. I used spark-submit script >> instead of the rest api. In my simple setup, I ran >> SparkJobServerDriver.main() directly in the AM as a spark job, which >> will submit the python job to the default spark master url pointing to >> "local". I also use --files in the spark-submit script to upload the python >> packages and boot script. On the python side, I was using the following >> pipeline options for submission (thanks to Thomas): >> >> pipeline_options = PipelineOptions([ >> >> "--runner=PortableRunner", >> >> "--job_endpoint=your-job-server:8099", >> >> "--environment_type=PROCESS", >> "--environment_config={\"command\": \"./boot\"}")] >> >> I used my own boot script for customized python packaging. WIth this >> setup I was able to get a simple hello-world program running. I haven't >> tried to run the job server separately from the AM yet. So hopefully >> setting --spark-master-url to be yarn will work too. >> >> Thanks, >> Xinyu >> >> On Tue, Jun 23, 2020 at 12:18 PM Kyle Weaver <kcwea...@google.com> wrote: >> >>> Hi Kamil, there is a JIRA for this: >>> https://issues.apache.org/jira/browse/BEAM-8970 It's theoretically >>> possible but remains untested as far as I know :) >>> >>> As I indicated in a comment, you can set --output_executable_path to >>> create a jar that you can then submit to yarn via spark-submit. >>> >>> If you can get this working, I'd additionally like to script the jar >>> submission in python to save users the extra step. >>> >>> Thanks, >>> Kyle >>> >>> On Tue, Jun 23, 2020 at 9:16 AM Kamil Wasilewski < >>> kamil.wasilew...@polidea.com> wrote: >>> >>>> Hi all, >>>> >>>> I'm trying to run a Beam pipeline using Spark on YARN. My pipeline is >>>> written in Python, so I need to use a portable runner. Does anybody know >>>> how I should configure job server parameters, especially >>>> --spark-master-url? Is there anything else I need to be aware of while >>>> using such setup? >>>> >>>> If it makes a difference, I use Google Dataproc. >>>> >>>> Best, >>>> Kamil >>>> >>>