[GitHub] [beam] KevinGG commented on a change in pull request #15490: [BEAM-10708] Introspect beam_sql output

GitBox Tue, 14 Sep 2021 09:58:48 -0700


KevinGG commented on a change in pull request #15490:
URL: https://github.com/apache/beam/pull/15490#discussion_r708465106




##########
File path: sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py
##########
@@ -110,15 +126,15 @@ def beam_sql(self, line: str, cell: str) -> Union[None, 
PValue]:
         return
       register_coder_for_schema(pcoll.element_type)
 
-    # TODO(BEAM-10708): implicitly execute the pipeline and write output into
-    # cache.
-    return apply_sql(cell, line, found)
+    output_name, output = apply_sql(cell, line, found)
+    cache_output(output_name, output)

Review comment:
       All the magics are one-shots. The user could re-execute the cell with 
the magic again to take in the most recent view.
   The cache here serves as the medium to materialize the output PCollection's 
data for introspection.
   
   We should notify the user about the diverge of pipelines after the magic in 
notebook examples we publish since the output PCollection is no longer part of 
their user pipeline.  The ideal usage of the magic is collecting as many 
sources as possible,  writing a pipeline in SQL using all the sources and 
introspecting the query's output, then if needed, sink the output PCollection 
to somewhere.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] KevinGG commented on a change in pull request #15490: [BEAM-10708] Introspect beam_sql output

Reply via email to