Updated the Flink section. To run a basic Python wordcount (sent to you in a separate thread, but repeating here too for others to play with):
Step 1: Run once to build a container: "./gradlew -p sdks/python/container docker" Step 2: ./gradlew :beam-runners-flink_2.11-job-server:runShadow - this starts up a local Flink portable JobService endpoint on localhost:8099 Step 3: run things using PortableRunner pointed at this endpoint - see e.g. https://github.com/apache/beam/pull/5824/files On Thu, Jun 28, 2018 at 1:37 AM Thomas Weise <t...@apache.org> wrote: > Ankur/Eugene, > > When you have a chance, please also update the Flink section of: > https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0 > > Thanks! > > On Thu, Jun 28, 2018 at 10:29 AM Thomas Weise <t...@apache.org> wrote: > >> The command to run the job server appears to be: ./gradlew -p >> runners/flink/job-server runShadow >> >> Can you please provide the equivalent of the super basic Python example >> from the prototype: >> >> >> https://github.com/bsidhom/beam/blob/hacking-job-server/sdks/python/flink-example.py >> >> Looks as if the Python side runner changed: >> >> Traceback (most recent call last): >> File "flink-example.py", line 7, in <module> >> from apache_beam.runners.portability import universal_local_runner >> ImportError: cannot import name universal_local_runner >> >> Thanks, >> Thomas >> >> >> On Wed, Jun 27, 2018 at 9:34 PM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Hi! >>> >>> Those instructions are not current and I think should be discarded as >>> they referred to a particular effort that is over - +Ankur Goenka >>> <goe...@google.com> is, I believe, working on the remaining finishing >>> touches for running from a clean clone of Beam master and documenting how >>> to do that; could you help Thomas so we can start looking at what the >>> streaming runner is missing? >>> >>> We'll need to document this in a more prominent place. When we get to a >>> state where we can run Python WordCount from master, we'll need to document >>> it somewhere on the main portability page and/or the getting started guide; >>> when we can run something more serious, e.g. Tensorflow pipelines, that >>> will be worth a Beam blog post and worth documenting in the TFX >>> documentation. >>> >>> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <t...@apache.org> wrote: >>> >>>> Hi Eugene, >>>> >>>> The basic streaming translation is already in place from the prototype, >>>> though I have not verified it on the master branch yet. >>>> >>>> Are the user instructions for the portable Flink runner at >>>> https://s.apache.org/beam-portability-team-doc current? >>>> >>>> (I don't have a dependency on SDF since we are going to use custom >>>> native Flink sources/sinks at this time.) >>>> >>>> Thanks, >>>> Thomas >>>> >>>> >>>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <kirpic...@google.com> >>>> wrote: >>>> >>>>> Hi! >>>>> >>>>> Wanted to let you know that I've just merged the PR that adds >>>>> checkpointable SDF support to the portable reference runner (ULR) and the >>>>> Java SDK harness: >>>>> >>>>> https://github.com/apache/beam/pull/5566 >>>>> >>>>> So now we have a reference implementation of SDF support in a portable >>>>> runner, and a reference implementation of SDF support in a portable SDK >>>>> harness. >>>>> From here on, we need to replicate this support in other portable >>>>> runners and other harnesses. The obvious targets are Flink and Python >>>>> respectively. >>>>> >>>>> Chamikara was going to work on the Python harness. +Thomas Weise >>>>> <t...@apache.org> Would you be interested in the Flink portable >>>>> streaming runner side? It is of course blocked by having the rest of that >>>>> runner working in streaming mode though (the batch mode is practically >>>>> done >>>>> - will send you a separate note about the status of that). >>>>> >>>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov < >>>>> kirpic...@google.com> wrote: >>>>> >>>>>> Luke is right - unbounded sources should go through SDF. I am >>>>>> currently working on adding such support to Fn API. >>>>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it >>>>>> focuses on a much more general case, but also considers in detail the >>>>>> specific case of running unbounded sources on Fn API), and the first >>>>>> related PR is https://github.com/apache/beam/pull/4743 . >>>>>> >>>>>> Ways you can help speed up this effort: >>>>>> - Make necessary changes to Apex runner per se to support regular >>>>>> SDFs in streaming (without portability). They will likely largely carry >>>>>> over to portable world. I recall that the Apex runner had some level of >>>>>> support of SDFs, but didn't pass the ValidatesRunner tests yet. >>>>>> - (general to Beam, not Apex-related per se) Implement the >>>>>> translation of Read.from(UnboundedSource) via impulse, which will require >>>>>> implementing an SDF that reads from a given UnboundedSource (taking the >>>>>> UnboundedSource as an element). This should be fairly straightforward and >>>>>> will allow all portable runners to take advantage of existing >>>>>> UnboundedSource's. >>>>>> >>>>>> >>>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote: >>>>>> >>>>>>> Using impulse is a precursor for both bounded and unbounded SDF. >>>>>>> >>>>>>> This JIRA represents the work that would be to add support for >>>>>>> unbounded SDF using portability APIs: >>>>>>> https://issues.apache.org/jira/browse/BEAM-2939 >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <t...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> So for streaming, we will need the Impulse translation for bounded >>>>>>>> input, identical with batch, and then in addition to that support for >>>>>>>> SDF? >>>>>>>> >>>>>>>> Any pointers what's involved in adding the SDF support? Is it >>>>>>>> runner specific? Does the ULR cover it? >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> All "sources" in portability will use splittable DoFns for >>>>>>>>> execution. >>>>>>>>> >>>>>>>>> Specifically, runners will need to be able to checkpoint unbounded >>>>>>>>> sources to get a minimum viable pipeline working. >>>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded >>>>>>>>> source. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <t...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm looking at the portable pipeline translation for streaming. I >>>>>>>>>> understand that for batch pipelines, it is sufficient to translate >>>>>>>>>> Impulse. >>>>>>>>>> >>>>>>>>>> What is the intended path to support unbounded sources? >>>>>>>>>> >>>>>>>>>> The goal here is to get a minimum translation working that will >>>>>>>>>> allow streaming wordcount execution. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Thomas >>>>>>>>>> >>>>>>>>>> >>>>>>>>