Re: Unbounded source translation for portable pipelines

Thomas Weise Thu, 28 Jun 2018 01:30:01 -0700

The command to run the job server appears to be: ./gradlew -p
runners/flink/job-server runShadow


Can you please provide the equivalent of the super basic Python example
from the prototype:

https://github.com/bsidhom/beam/blob/hacking-job-server/sdks/python/flink-example.py

Looks as if the Python side runner changed:

Traceback (most recent call last):
  File "flink-example.py", line 7, in <module>
    from apache_beam.runners.portability import universal_local_runner
ImportError: cannot import name universal_local_runner

Thanks,
Thomas


On Wed, Jun 27, 2018 at 9:34 PM Eugene Kirpichov <[email protected]>
wrote:

> Hi!
>
> Those instructions are not current and I think should be discarded as they
> referred to a particular effort that is over - +Ankur Goenka
> <[email protected]> is, I believe, working on the remaining finishing
> touches for running from a clean clone of Beam master and documenting how
> to do that; could you help Thomas so we can start looking at what the
> streaming runner is missing?
>
> We'll need to document this in a more prominent place. When we get to a
> state where we can run Python WordCount from master, we'll need to document
> it somewhere on the main portability page and/or the getting started guide;
> when we can run something more serious, e.g. Tensorflow pipelines, that
> will be worth a Beam blog post and worth documenting in the TFX
> documentation.
>
> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <[email protected]> wrote:
>
>> Hi Eugene,
>>
>> The basic streaming translation is already in place from the prototype,
>> though I have not verified it on the master branch yet.
>>
>> Are the user instructions for the portable Flink runner at
>> https://s.apache.org/beam-portability-team-doc current?
>>
>> (I don't have a dependency on SDF since we are going to use custom native
>> Flink sources/sinks at this time.)
>>
>> Thanks,
>> Thomas
>>
>>
>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <[email protected]>
>> wrote:
>>
>>> Hi!
>>>
>>> Wanted to let you know that I've just merged the PR that adds
>>> checkpointable SDF support to the portable reference runner (ULR) and the
>>> Java SDK harness:
>>>
>>> https://github.com/apache/beam/pull/5566
>>>
>>> So now we have a reference implementation of SDF support in a portable
>>> runner, and a reference implementation of SDF support in a portable SDK
>>> harness.
>>> From here on, we need to replicate this support in other portable
>>> runners and other harnesses. The obvious targets are Flink and Python
>>> respectively.
>>>
>>> Chamikara was going to work on the Python harness. +Thomas Weise
>>> <[email protected]> Would you be interested in the Flink portable
>>> streaming runner side? It is of course blocked by having the rest of that
>>> runner working in streaming mode though (the batch mode is practically done
>>> - will send you a separate note about the status of that).
>>>
>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <[email protected]>
>>> wrote:
>>>
>>>> Luke is right - unbounded sources should go through SDF. I am currently
>>>> working on adding such support to Fn API.
>>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>>>> focuses on a much more general case, but also considers in detail the
>>>> specific case of running unbounded sources on Fn API), and the first
>>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>>
>>>> Ways you can help speed up this effort:
>>>> - Make necessary changes to Apex runner per se to support regular SDFs
>>>> in streaming (without portability). They will likely largely carry over to
>>>> portable world. I recall that the Apex runner had some level of support of
>>>> SDFs, but didn't pass the ValidatesRunner tests yet.
>>>> - (general to Beam, not Apex-related per se) Implement the translation
>>>> of Read.from(UnboundedSource) via impulse, which will require implementing
>>>> an SDF that reads from a given UnboundedSource (taking the UnboundedSource
>>>> as an element). This should be fairly straightforward and will allow all
>>>> portable runners to take advantage of existing UnboundedSource's.
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <[email protected]> wrote:
>>>>
>>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>>
>>>>> This JIRA represents the work that would be to add support for
>>>>> unbounded SDF using portability APIs:
>>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <[email protected]> wrote:
>>>>>
>>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>>> input, identical with batch, and then in addition to that support for 
>>>>>> SDF?
>>>>>>
>>>>>> Any pointers what's involved in adding the SDF support? Is it runner
>>>>>> specific? Does the ULR cover it?
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> All "sources" in portability will use splittable DoFns for execution.
>>>>>>>
>>>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>>>> sources to get a minimum viable pipeline working.
>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>>> source.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>>>> understand that for batch pipelines, it is sufficient to translate 
>>>>>>>> Impulse.
>>>>>>>>
>>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>>
>>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>>> allow streaming wordcount execution.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>

Re: Unbounded source translation for portable pipelines

Reply via email to