Ning Kang created BEAM-7760:
-------------------------------
Summary: Interactive Beam Caching PCollections bound to user
defined vars in notebook
Key: BEAM-7760
URL: https://issues.apache.org/jira/browse/BEAM-7760
Project: Beam
Issue Type: New Feature
Components: examples-python
Reporter: Ning Kang
Cache only PCollections bound to user defined variables in a pipeline when
running pipeline with interactive runner in jupyter notebooks.
[Interactive
Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
has been caching and using caches of "leaf" PCollections for interactive
execution in jupyter notebooks.
The interactive execution is currently supported so that when appending new
transforms to existing pipeline for a new run, executed part of the pipeline
doesn't need to be re-executed.
A PCollection is "leaf" when it is never used as input in any PTransform in the
pipeline.
The problem with building caches and pipeline to execute around "leaf" is that
when a PCollection is consumed by a sink with no output, the pipeline to
execute built will miss the subgraph generating and consuming that PCollection.
An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty pipeline.
Caching around PCollections bound to user defined variables and replacing
transforms with source and sink of caches could resolve the pipeline to execute
properly under the interactive execution scenario. Also, cached PCollection now
can trace back to user code and can be used for user data visualization if user
wants to do it.
E.g.,
{code:java}
// ...
p = beam.Pipeline(interactive_runner.InteractiveRunner(),
options=pipeline_options)
messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...')
messages | "Write" >> beam.io.WriteToPubSub(topic_path)
result = p.run()
// ...
visualize(messages){code}
The interactive runner automatically figures out that PCollection created by
{code:java}
p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)