Filed https://issues.apache.org/jira/browse/BEAM-8507 for the issue I mentioned.
On Mon, Oct 28, 2019 at 4:12 PM Kyle Weaver <kcwea...@google.com> wrote: > > I'd like to see this issue resolved before 2.17 as changing the public > API once it's released will be harder. > > +1. In particular, I misunderstood that [auto] is not supported by > `FlinkUberJarJobServer`. Since [auto] is now the default, it's broken for > Python 3.6+. > > requests.exceptions.InvalidURL: Failed to parse: [auto]/v1/config > > We definitely should fix that, if nothing else. > > > One concern with this is that just supplying host:port is the existing > > behavior, so we can't start requiring the http://. > > The user shouldn't have to specify a protocol for Python, I think it's > preferable and reasonable to handle that for them in order to maintain > existing behavior and align with Java SDK. > > > 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient > > to have either a non-empty address string or an empty one. The empty > > string would either mean local execution or, in the context of the Flink > > CLI tool, loading the master address from the config. The non-empty > > string would be interpreted as a cluster address. > > Looks like we also have a [collection] configuration value [1]. > > If we're using [auto] as the default, I don't think this really makes so > much of a difference (as long as we're supporting and documenting these > properly, of course). I'm not sure there's a compelling reason to change > this? > > > always run locally (the least surprising to me > > I agree a local cluster should remain the default, whether that is > achieved through [local] or [auto] or some new mechanism such as the above. > > [1] > https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L80 > > On Mon, Oct 28, 2019 at 6:35 AM Maximilian Michels <m...@apache.org> wrote: > >> Hi, >> >> Robert and Kyle have been doing great work to simplify submitting >> portable pipelines with the Flink Runner. Part of this is having a >> Python "FlinkRunner" which handles bringing up a Beam job server and >> submitting the pipeline directly via the Flink REST API. One building >> block is the creation of "executable Jars" which contain the >> materialized / translated Flink pipeline and do not require the Beam job >> server or the Python driver anymore. >> >> While unifying a newly introduced option "flink_master_url" with the >> pre-existing "flink_master" [1][2], some questions came up about Flink's >> execution modes. (The two options are meant to do the same thing: >> provide the address of the Flink master to hand-over the translated >> pipeline.) >> >> Historically, Flink had a proprietary protocol for submitting pipelines, >> running on port 9091. This has since been replaced with a REST protocol >> at port 8081. To this date, this has implications how you submit >> programs, e.g. the Flink client libraries expects the address to be of >> form "host:port", without a protocol scheme. On the other hand, external >> Rest libraries typically expect a protocol scheme. >> >> But this is only half of the fun. There are also special addresses for >> "flink_master" that influence submission of the pipeline. If you specify >> "[local]" as the address, the pipeline won't be submitted but executed >> in a local in-process Flink cluster. If you specify "[auto]" and you use >> the CLI tool that comes bundled with Flink, then the master address will >> be loaded from the Flink config, including any configuration like SSL. >> If none is found, then it falls back to "[local]". >> >> This is a bit odd, and after a discussion with Robert and Thomas in [1], >> we figured that this needs to be changed: >> >> 1. Make the master address a URL. Add "http://" to "flink_master" in >> Python if no scheme is specified. Similarly, remove any "http://" in >> Java, since the Java rest client does not expect a scheme. In case of >> "http_s_://", we have a special treatment to load the SSL settings from >> the Flink config. >> >> 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient >> to have either a non-empty address string or an empty one. The empty >> string would either mean local execution or, in the context of the Flink >> CLI tool, loading the master address from the config. The non-empty >> string would be interpreted as a cluster address. >> >> >> Any opinions on this? >> >> >> Thanks, >> Max >> >> >> [1] https://github.com/apache/beam/pull/9803 >> [2] https://github.com/apache/beam/pull/9844 >> >