[
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409032#comment-17409032
]
Ning commented on BEAM-10708:
-----------------------------
Thanks for asking. Switching to "FnApiRunner" makes the deprecation of runner
api roundtrip feasible.
Our roundtrip *does nothing but making copies of pipelines*.
{color:#505F79}It's something useful (to avoid corrupting the __main__ module
in the REPL env) in the interactive scenario where the user
applies-transforms-then-inspect-output one by one, but not needed for scenarios
where the user creates-pipelines-then-execute one by one (or any other
non-interactive use cases. Deep copying pipelines is not a feature because it's
not needed.){color}
To deprecate it, instead of making copies of pipelines, we make copies of
runner api protos.
Theoretically, runner api is the SDK-and-runner-independent definition of a
Beam pipeline. Every runner implementation should be able to accept them for
execution.
For DirectRunner, the right approach is to use its FnApiRunner implementation
that is implemented to accept a runner api through "run_via_runner_api".
I hope DataflowRunner could also support "run_via_runner_api" to truly support
mixing matching SDKs and runners envisioned
[here|https://docs.google.com/document/d/1XYzb1Fnt2sam7u2MsGFaZp-2qSIGxUn66VLer-bcXAk/edit#heading=h.p6lvszfbmyj6].
But to productionize a pipeline from notebooks to dataflow, we could have
other workaround. Making copies of pipelines is not needed.
> InteractiveRunner cannot execute pipeline with cross-language transform
> -----------------------------------------------------------------------
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
> Issue Type: Bug
> Components: cross-language
> Reporter: Brian Hulette
> Assignee: Ning
> Priority: P2
> Time Spent: 30h 50m
> Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14 AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
> creates a copy of the pipeline by [writing it to proto and reading it
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
> Reading it back fails because some of the pipeline is not written in Python.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)