KevinGG commented on a change in pull request #15490:
URL: https://github.com/apache/beam/pull/15490#discussion_r708465106
##########
File path: sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py
##########
@@ -110,15 +126,15 @@ def beam_sql(self, line: str, cell: str) -> Union[None,
PValue]:
return
register_coder_for_schema(pcoll.element_type)
- # TODO(BEAM-10708): implicitly execute the pipeline and write output into
- # cache.
- return apply_sql(cell, line, found)
+ output_name, output = apply_sql(cell, line, found)
+ cache_output(output_name, output)
Review comment:
All the magics are one-shots. The user could re-execute the cell with
the magic again to take in the most recent view.
The cache here serves as the medium to materialize the output PCollection's
​data for introspection.
We should notify the user about the diverge of pipelines after the magic in
notebook examples we publish since the output PCollection is no longer part of
their user pipeline. The ideal usage of the magic is collecting as many
sources as possible, writing a pipeline in SQL using all the sources and
introspecting the query's output, then if needed, sink the output PCollection
to somewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]