Thanks for bringing this to the list. Some comments below, though it
would be good to get additional feedback beyond those that have been
participating on the PR, if any. I'd like to see this issue resolved
before 2.17 as changing the public API once it's released will be
harder.

On Mon, Oct 28, 2019 at 6:36 AM Maximilian Michels <m...@apache.org> wrote:
>
> Hi,
>
> Robert and Kyle have been doing great work to simplify submitting
> portable pipelines with the Flink Runner. Part of this is having a
> Python "FlinkRunner" which handles bringing up a Beam job server and
> submitting the pipeline directly via the Flink REST API. One building
> block is the creation of "executable Jars" which contain the
> materialized / translated Flink pipeline and do not require the Beam job
> server or the Python driver anymore.
>
> While unifying a newly introduced option "flink_master_url" with the
> pre-existing "flink_master" [1][2], some questions came up about Flink's
> execution modes. (The two options are meant to do the same thing:
> provide the address of the Flink master to hand-over the translated
> pipeline.)
>
> Historically, Flink had a proprietary protocol for submitting pipelines,
> running on port 9091. This has since been replaced with a REST protocol
> at port 8081. To this date, this has implications how you submit
> programs, e.g. the Flink client libraries expects the address to be of
> form "host:port", without a protocol scheme. On the other hand, external
> Rest libraries typically expect a protocol scheme.
>
> But this is only half of the fun. There are also special addresses for
> "flink_master" that influence submission of the pipeline. If you specify
> "[local]" as the address, the pipeline won't be submitted but executed
> in a local in-process Flink cluster. If you specify "[auto]" and you use
> the CLI tool that comes bundled with Flink, then the master address will
> be loaded from the Flink config, including any configuration like SSL.
> If none is found, then it falls back to "[local]".
>
> This is a bit odd, and after a discussion with Robert and Thomas in [1],
> we figured that this needs to be changed:
>
> 1. Make the master address a URL. Add "http://"; to "flink_master" in
> Python if no scheme is specified. Similarly, remove any "http://"; in
> Java, since the Java rest client does not expect a scheme. In case of
> "http_s_://", we have a special treatment to load the SSL settings from
> the Flink config.

One concern with this is that just supplying host:port is the existing
behavior, so we can't start requiring the http://. The question
becomes whether we should allow it, or infer http[s] (e.g. by trying
out https first). One question I have is if there are other
authentication parameters that may be required for speaking to a flink
endpoint that we should be aware of (that would normally be buried in
the config file).

> 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient
> to have either a non-empty address string or an empty one. The empty
> string would either mean local execution or, in the context of the Flink
> CLI tool, loading the master address from the config. The non-empty
> string would be interpreted as a cluster address.

I do like being explicit with something like [local] rather than
treating the empty string in a magical way. Perhaps we could have
[config] be an alias for [auto] indicating to read the config to get
the parameter (if any). The tricky bit it seems is that if running as
CLI (where, as I understand it, the jar is then executed on the
cluster) one does not want to run it truly locally but on that
cluster.

For non-CLI, the two options for default are to always run locally
(the least surprising to me, but it'd be good to get feedback from
others) or to read the config file and submit it to the value there,
if any.

> Any opinions on this?
>
>
> Thanks,
> Max
>
>
> [1] https://github.com/apache/beam/pull/9803
> [2] https://github.com/apache/beam/pull/9844

Reply via email to