I am doing some prototyping on this too. I used spark-submit script instead
of the rest api. In my simple setup, I ran SparkJobServerDriver.main()
directly in the AM as a spark job, which will submit the python job to the
default spark master url pointing to "local". I also use --files in the
spark-submit script to upload the python packages and boot script. On the
python side, I was using the following pipeline options for submission
(thanks to Thomas):

    pipeline_options = PipelineOptions([

        "--runner=PortableRunner",

        "--job_endpoint=your-job-server:8099",

        "--environment_type=PROCESS",
        "--environment_config={\"command\": \"./boot\"}")]

I used my own boot script for customized python packaging. WIth this setup
I was able to get a simple hello-world program running. I haven't tried to
run the job server separately from the AM yet. So hopefully setting
--spark-master-url to be yarn will work too.

Thanks,
Xinyu

On Tue, Jun 23, 2020 at 12:18 PM Kyle Weaver <kcwea...@google.com> wrote:

> Hi Kamil, there is a JIRA for this:
> https://issues.apache.org/jira/browse/BEAM-8970 It's theoretically
> possible but remains untested as far as I know :)
>
> As I indicated in a comment, you can set --output_executable_path to
> create a jar that you can then submit to yarn via spark-submit.
>
> If you can get this working, I'd additionally like to script the jar
> submission in python to save users the extra step.
>
> Thanks,
> Kyle
>
> On Tue, Jun 23, 2020 at 9:16 AM Kamil Wasilewski <
> kamil.wasilew...@polidea.com> wrote:
>
>> Hi all,
>>
>> I'm trying to run a Beam pipeline using Spark on YARN. My pipeline is
>> written in Python, so I need to use a portable runner. Does anybody know
>> how I should configure job server parameters, especially
>> --spark-master-url?  Is there anything else I need to be aware of while
>> using such setup?
>>
>> If it makes a difference, I use Google Dataproc.
>>
>> Best,
>> Kamil
>>
>

Reply via email to