[
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891137#comment-15891137
]
Eugene Kirpichov commented on BEAM-849:
---------------------------------------
On your first paragraph: yes, I'm not saying that waitUntilFinish should
trigger the termination - I'm just saying that such a pipeline must terminate
regardless of whether it is a "streaming runner" or not (and this is currently
not the case, e.g., for the Dataflow runner); so if somebody calls
waitUntilFinish, this call should wait for that termination and complete as
well. I guess, though, at this point we're discussing "when should pipelines
terminate" rather than "what should the API be for detecting that".
In the example I gave, the "end" is not necessarily known ahead of execution.
E.g. imagine a use case where we continually tail the file and stream data from
it until the file is marked with a read-only attribute. It might get marked
soon, tomorrow, or never at all - then we should keep running and processing
new records as they arrive; but when it's marked read-only, the pipeline should
terminate.
Unbounded collections are part of the SDK, but unbounded pipelines are not. I
guess one could introduce terminology that an unbounded pipeline is a pipeline
that has at least one unbounded collection?... but again, it seems like the
only use case for that would be when a runner that only supports bounded
collections validates whether a pipeline being submitted satisfies this.
Ideally I would like to stop using terminology such as "batch" and "streaming"
altogether, except when referring to a particular runner ("batch Dataflow
runner", "streaming Spark runner").
> Redesign PipelineResult API
> ---------------------------
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Pei He
>
> Current state:
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses
> waitUntilFinish() and cancel().
> However, there are additional work around PipelineResult:
> need clearly defined contract and verification across all runners
> need to revisit how to handle metrics/aggregators
> need to be able to get logs
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)