[
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408457#comment-17408457
]
Ning commented on BEAM-10708:
-----------------------------
*For streaming, a summary of the current status*:
For the 2 DirectRunner implementations:
FnApiRunner
* Support SqlTransform
* Does not support TestStream
* Does not support NativeSources such as ReadFromPubSub
BundleBasedDirectRunner
* Support TestStream and ReadFromPubSub
* Does not support SqlTransform
SqlTransform is not supported by DirectRunner for streaming pipelines anyway.
I'm going to make FnApiRunner support TestStream so that
* Support SqlTransform
* Support TestStream/NativeSources such as ReadFromPubSub (locally an
InteractiveRunner uses TestStream to replace all ReadFromPubSubs)
* Not to be deprecated and xLang friendly.
*The other topic*:
The workaround usage for SQL can be demonstrated as:
{code:python}
p = beam.Pipeline(ir.InteractiveRunner())
class Person(typing.NamedTuple):
id: int
name: str
persons = (p | beam.Create(range(10))
| beam.Map(lambda x: Person(id=x,
name=names.get_full_name())).with_output_types(Person))
persons_2 = (p | beam.Create(range(5,15))
| beam.Map(lambda x: Person(id=x,
name=names.get_full_name())).with_output_types(Person))
%%beam_sql persons_id_conflict # This is the output variable name in the
__main__ module.
SELECT * FROM persons JOIN persons_2 USING (id)
{code}
The output is a PCollection. Note there is no need from the user to explicitly
apply a SqlTransform nor build the input PCollection dict.
Under the hood, InteractiveRunner has no change (still uses roundtrips to
duplicate pipelines without external transforms).
For SQL usages, the beam_sql ipython magic takes care of implicitly building a
pipeline with appropriate SqlTransform and inputs.
> InteractiveRunner cannot execute pipeline with cross-language transform
> -----------------------------------------------------------------------
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
> Issue Type: Bug
> Components: cross-language
> Reporter: Brian Hulette
> Assignee: Ning
> Priority: P2
> Time Spent: 30h 20m
> Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14 AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
> creates a copy of the pipeline by [writing it to proto and reading it
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
> Reading it back fails because some of the pipeline is not written in Python.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)