[
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890855#comment-15890855
]
Eugene Kirpichov commented on BEAM-849:
---------------------------------------
I disagree that "unbounded pipelines" can't finish successfully.
- Dataflow runner supports draining of pipelines, which leads to successful
termination.
- It is possible to run a pipeline like Create.of(1, 2, 3) + ParDo(do nothing)
using a streaming runner, and it should terminate rather than hang.
- One might ask "why run such a pipeline with a streaming runner", but it makes
a lot more sense if the ParDo is splittable. E.g. Create.of(filename) +
ParDo(tail file) + ParDo(process records) could use the low-latency
capabilities of a streaming runner, but successfully terminate when the file is
somehow "finalized". As a more mundane example - tests in
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
should pass in streaming runners as well as batch runners.
- "Unbounded pipeline" is in general not a Beam concept - we should have a
batch/streaming-agnostic meaning of "finished" in "waitUntilFinished". I
propose the one that Dataflow runner uses for deciding when drain is completed:
"all watermarks have progressed to infinity".
> Redesign PipelineResult API
> ---------------------------
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Pei He
>
> Current state:
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses
> waitUntilFinish() and cancel().
> However, there are additional work around PipelineResult:
> need clearly defined contract and verification across all runners
> need to revisit how to handle metrics/aggregators
> need to be able to get logs
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)