[ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408457#comment-17408457
 ] 

Ning commented on BEAM-10708:
-----------------------------

*For streaming,  a summary of the current status*:

For the 2 DirectRunner implementations:
FnApiRunner
* Support SqlTransform
* Does not support TestStream
* Does not support NativeSources such as ReadFromPubSub

BundleBasedDirectRunner
* Support TestStream and ReadFromPubSub
* Does not support SqlTransform

SqlTransform is not supported by DirectRunner for streaming pipelines anyway.

I'm going to make FnApiRunner support TestStream so that
* Support SqlTransform
* Support TestStream/NativeSources such as ReadFromPubSub (locally an 
InteractiveRunner uses TestStream to replace all ReadFromPubSubs)
* Not to be deprecated and xLang friendly.


*The other topic*:

The workaround usage for SQL can be demonstrated as:


{code:python}
p = beam.Pipeline(ir.InteractiveRunner())

class Person(typing.NamedTuple):
    id: int
    name: str

persons = (p | beam.Create(range(10))
           | beam.Map(lambda x: Person(id=x, 
name=names.get_full_name())).with_output_types(Person))

persons_2 = (p | beam.Create(range(5,15))
              | beam.Map(lambda x: Person(id=x, 
name=names.get_full_name())).with_output_types(Person)) 

%%beam_sql persons_id_conflict  # This is the output variable name in the 
__main__ module.
SELECT * FROM persons JOIN persons_2 USING (id)
{code}

The output is a PCollection. Note there is no need from the user to explicitly 
apply a SqlTransform nor build the input PCollection dict.

Under the hood, InteractiveRunner has no change (still uses roundtrips to 
duplicate pipelines without external transforms).
For SQL usages, the beam_sql ipython magic takes care of implicitly building a 
pipeline with appropriate SqlTransform and inputs.





> InteractiveRunner cannot execute pipeline with cross-language transform
> -----------------------------------------------------------------------
>
>                 Key: BEAM-10708
>                 URL: https://issues.apache.org/jira/browse/BEAM-10708
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language
>            Reporter: Brian Hulette
>            Assignee: Ning
>            Priority: P2
>          Time Spent: 30h 20m
>  Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
>             CAST(1 AS INT) AS `id`,
>             CAST('foo' AS VARCHAR) AS `str`,
>             CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to