Hi all, I'll leave another 3 days for design <https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing> review. Then we can have a vote session if there is no objection.
Thanks! On Fri, Aug 9, 2019 at 12:14 PM Ning Kang <[email protected]> wrote: > Thanks Ahmet for the introduction! > > I've composed a design overview > <https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing> > describing changes we are making to components around interactive runner. > I'll share the document in our email thread too. > > The truth is since interactive runner is not yet a recognized runner as > part of the Beam SDK (and it's fundamentally a wrapper around direct > runner), we are not touching any Beam SDK components. > We'll not change any behavior of existing Beam SDK and we'll try our best > to keep it that way in the future. > > In the meantime, I'll work on other components orthogonal to Beam such as > Pipeline Display and Data Visualization I mentioned in the design overview. > > If you have any questions, please feel free to contact me through this > email address! > > Thanks! > > Regards, > Ning. > > On Wed, Aug 7, 2019 at 5:01 PM Ahmet Altay <[email protected]> wrote: > >> Ning, thank you for the heads up. >> >> All, this is a proposed work for improving interactive Beam experience. >> As mentioned in Ning's email, new concepts are being introduced. And in >> addition iBeam as a name is used as a new reference. I hope that bringing >> the discussion to the mailing list will give it the additional >> visibility and more people could share their feedback. >> >> (cc'ing a few folks that might be interested +Robert Bradshaw >> <[email protected]> +Valentyn Tymofieiev <[email protected]> +Sindy >> Li <[email protected]> +Brian Hulette <[email protected]> ) >> >> Ahmet >> >> >> On Wed, Aug 7, 2019 at 12:36 PM Ning Kang <[email protected]> wrote: >> >>> To whom may concern, >>> >>> This is Ning from Google. We are currently making efforts to leverage an >>> interactive runner under python beam sdk. >>> >>> There is already an interactive Beam (iBeam for short) runner with >>> jupyter notebook in the repo >>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive> >>> . >>> Following the instructions on that page, one can set up an interactive >>> environment to develop and execute Beam pipeline interactively. >>> >>> However, there are many issues with existing iBeam. One issue is that it >>> uses a concept of leaf PCollection to cache and materialize intermediate >>> PCollection. If the user wants to reuse/introspect a non-leaf PCollection, >>> the interactive runner will run into errors. >>> >>> Our initial effort will be fixing the existing issues. And we also want >>> to make iBeam easy to use. Since iBeam uses the same model Beam uses, there >>> isn't really any difference for users between creating a pipeline with >>> interactive runner and other runners. >>> So we want to minimize the interfaces a user needs to learn while giving >>> the user some capability to interact with the interactive environment. >>> >>> See this initial PR <https://github.com/apache/beam/pull/9278>, the >>> interactive_beam module will provide mainly 4 interfaces: >>> >>> - For advanced users who define pipeline outside __main__, let them >>> tell current interactive environment where they define their pipeline: >>> watch() >>> - This is very useful for tests where pipeline can be defined in >>> test methods. >>> - If the user simply creates pipeline in a Jupyter notebook or a >>> plain Python script, they don't have to know/use this feature at all. >>> - Let users create an interactive pipeline: create_pipeline() >>> - invoking create_pipeline(), the user gets a Pipeline object >>> that works as any other Pipeline object created from >>> apache_beam.Pipeline() >>> - However, the pipeline object p, when invoking p.run(), does >>> some extra interactive magic. >>> - We'll support interactive execution for DirectRunner at this >>> moment. >>> - Let users run the interactive pipeline as a normal pipeline: >>> run_pipeline() >>> - In an interactive environment, a user only needs to add and >>> execute 1 line of code run_pipeline(pipeline) to execute any existing >>> interactive pipeline object as normal pipeline in any selected >>> platform. >>> - We'll probably support Dataflow only. Other implementations can >>> be added though. >>> - Let users introspect any intermediate PCollection they have >>> handler to: visualize() >>> - If a user ever writes pcoll = p | "Some Transform" >> >>> some_transform() ..., they can visualize(pcoll) once the pipeline p is >>> executed. >>> - p can be batch or streaming >>> - The visualization will be some plot graph of data for the given >>> PCollection as if it's materialized. If the PCollection is unbounded, >>> the >>> graph is dynamic. >>> >>> The PR will implement 1 and 2. >>> >>> We'll use https://issues.apache.org/jira/browse/BEAM-7923 as the top >>> level JIRA and add blocking JIRAs as development goes. >>> >>> External Beam users will not worry about any of the underlying >>> implementation details. >>> Except the 4 interfaces above, they learn and write normal Beam code and >>> can execute the pipeline immediately when they are done with prototyping. >>> >>> Ning. >>> >>
