[
https://issues.apache.org/jira/browse/BEAM-10708?focusedWorklogId=650698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-650698
]
ASF GitHub Bot logged work on BEAM-10708:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Sep/21 16:58
Start Date: 14/Sep/21 16:58
Worklog Time Spent: 10m
Work Description: KevinGG commented on a change in pull request #15490:
URL: https://github.com/apache/beam/pull/15490#discussion_r708465106
##########
File path: sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py
##########
@@ -110,15 +126,15 @@ def beam_sql(self, line: str, cell: str) -> Union[None,
PValue]:
return
register_coder_for_schema(pcoll.element_type)
- # TODO(BEAM-10708): implicitly execute the pipeline and write output into
- # cache.
- return apply_sql(cell, line, found)
+ output_name, output = apply_sql(cell, line, found)
+ cache_output(output_name, output)
Review comment:
All the magics are one-shots. The user could re-execute the cell with
the magic again to take in the most recent view.
The cache here serves as the medium to materialize the output PCollection's
​data for introspection.
We should notify the user about the diverge of pipelines after the magic in
notebook examples we publish since the output PCollection is no longer part of
their user pipeline. The ideal usage of the magic is collecting as many
sources as possible, writing a pipeline in SQL using all the sources and
introspecting the query's output, then if needed, sink the output PCollection
to somewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 650698)
Time Spent: 35.5h (was: 35h 20m)
> InteractiveRunner cannot execute pipeline with cross-language transform
> -----------------------------------------------------------------------
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
> Issue Type: Bug
> Components: cross-language
> Reporter: Brian Hulette
> Assignee: Ning
> Priority: P2
> Time Spent: 35.5h
> Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14 AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
> creates a copy of the pipeline by [writing it to proto and reading it
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
> Reading it back fails because some of the pipeline is not written in Python.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)