Hi all, I'll leave another 3 days for design
<https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing>
review.
Then we can have a vote session if there is no objection.

Thanks!

On Fri, Aug 9, 2019 at 12:14 PM Ning Kang <[email protected]> wrote:

> Thanks Ahmet for the introduction!
>
> I've composed a design overview
> <https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing>
> describing changes we are making to components around interactive runner.
> I'll share the document in our email thread too.
>
> The truth is since interactive runner is not yet a recognized runner as
> part of the Beam SDK (and it's fundamentally a wrapper around direct
> runner), we are not touching any Beam SDK components.
> We'll not change any behavior of existing Beam SDK and we'll try our best
> to keep it that way in the future.
>
> In the meantime, I'll work on other components orthogonal to Beam such as
> Pipeline Display and Data Visualization I mentioned in the design overview.
>
> If you have any questions, please feel free to contact me through this
> email address!
>
> Thanks!
>
> Regards,
> Ning.
>
> On Wed, Aug 7, 2019 at 5:01 PM Ahmet Altay <[email protected]> wrote:
>
>> Ning, thank you for the heads up.
>>
>> All, this is a proposed work for improving interactive Beam experience.
>> As mentioned in Ning's email, new concepts are being introduced. And in
>> addition iBeam as a name is used as a new reference. I hope that bringing
>> the discussion to the mailing list will give it the additional
>> visibility and more people could share their feedback.
>>
>> (cc'ing a few folks that might be interested +Robert Bradshaw
>> <[email protected]> +Valentyn Tymofieiev <[email protected]> +Sindy
>> Li <[email protected]> +Brian Hulette <[email protected]> )
>>
>> Ahmet
>>
>>
>> On Wed, Aug 7, 2019 at 12:36 PM Ning Kang <[email protected]> wrote:
>>
>>> To whom may concern,
>>>
>>> This is Ning from Google. We are currently making efforts to leverage an
>>> interactive runner under python beam sdk.
>>>
>>> There is already an interactive Beam (iBeam for short) runner with
>>> jupyter notebook in the repo
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>>> .
>>> Following the instructions on that page, one can set up an interactive
>>> environment to develop and execute Beam pipeline interactively.
>>>
>>> However, there are many issues with existing iBeam. One issue is that it
>>> uses a concept of leaf PCollection to cache and materialize intermediate
>>> PCollection. If the user wants to reuse/introspect a non-leaf PCollection,
>>> the interactive runner will run into errors.
>>>
>>> Our initial effort will be fixing the existing issues. And we also want
>>> to make iBeam easy to use. Since iBeam uses the same model Beam uses, there
>>> isn't really any difference for users between creating a pipeline with
>>> interactive runner and other runners.
>>> So we want to minimize the interfaces a user needs to learn while giving
>>> the user some capability to interact with the interactive environment.
>>>
>>> See this initial PR <https://github.com/apache/beam/pull/9278>, the
>>> interactive_beam module will provide mainly 4 interfaces:
>>>
>>>    - For advanced users who define pipeline outside __main__, let them
>>>    tell current interactive environment where they define their pipeline:
>>>    watch()
>>>       - This is very useful for tests where pipeline can be defined in
>>>       test methods.
>>>       - If the user simply creates pipeline in a Jupyter notebook or a
>>>       plain Python script, they don't have to know/use this feature at all.
>>>    - Let users create an interactive pipeline: create_pipeline()
>>>       - invoking create_pipeline(), the user gets a Pipeline object
>>>       that works as any other Pipeline object created from 
>>> apache_beam.Pipeline()
>>>       - However, the pipeline object p, when invoking p.run(), does
>>>       some extra interactive magic.
>>>       - We'll support interactive execution for DirectRunner at this
>>>       moment.
>>>    - Let users run the interactive pipeline as a normal pipeline:
>>>    run_pipeline()
>>>       - In an interactive environment, a user only needs to add and
>>>       execute 1 line of code run_pipeline(pipeline) to execute any existing
>>>       interactive pipeline object as normal pipeline in any selected 
>>> platform.
>>>       - We'll probably support Dataflow only. Other implementations can
>>>       be added though.
>>>    - Let users introspect any intermediate PCollection they have
>>>    handler to: visualize()
>>>       - If a user ever writes pcoll = p | "Some Transform" >>
>>>       some_transform() ..., they can visualize(pcoll) once the pipeline p is
>>>       executed.
>>>       - p can be batch or streaming
>>>       - The visualization will be some plot graph of data for the given
>>>       PCollection as if it's materialized. If the PCollection is unbounded, 
>>> the
>>>       graph is dynamic.
>>>
>>> The PR will implement 1 and 2.
>>>
>>> We'll use https://issues.apache.org/jira/browse/BEAM-7923 as the top
>>> level JIRA and add blocking JIRAs as development goes.
>>>
>>> External Beam users will not worry about any of the underlying
>>> implementation details.
>>> Except the 4 interfaces above, they learn and write normal Beam code and
>>> can execute the pipeline immediately when they are done with prototyping.
>>>
>>> Ning.
>>>
>>

Reply via email to