> Could the error message suggest switching to FlinkRunner (and/or other
runners that start a job server for you)? Then it seems like the breakage
would only be a minor annoyance.

Definitely.

On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <[email protected]> wrote:

> Could the error message suggest switching to FlinkRunner (and/or other
> runners that start a job server for you)? Then it seems like the breakage
> would only be a minor annoyance.
>
> Brian
>
> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <[email protected]> wrote:
>
>> Hi all,
>>
>> Currently, when running a pipeline that has the options
>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
>> Dockerized Flink job server [1]. This is problematic because the
>> PortableRunner can be used by any portable runner. So for example, a Spark
>> runner user was recently baffled when their job ran successfully but
>> printed a bunch of Flink log messages.
>>
>> There are not too many uses of this default behavior to my knowledge, at
>> least within Beam itself. The only example I could find was in the
>> portableWordCount tests, which is mostly the same as
>> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
>> superseded by the FlinkRunner class, which provides better encapsulation.
>>
>> I also noticed that DockerizedJobServer is only used by [3]. In
>> FlinkRunner, we pull the job server from Maven if necessary and call Java
>> directly. In general, I think there are already quite enough knobs in the
>> portability framework, so we should remove it unless there is reason to
>> prefer running the job server with Docker instead of calling Java directly.
>>
>> There are a couple options:
>>
>> A) Remove the default behavior and require job_endpoint to always be set
>> when using PortableRunner. This would be a breaking change.
>> B) Keep the current behavior, but warn when the user sets
>> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
>> better than nothing.
>>
>> What do you think?
>>
>> [1]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>> [2]
>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>> [3]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>>
>

Reply via email to