[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891137#comment-15891137
 ] 

Eugene Kirpichov commented on BEAM-849:
---------------------------------------

On your first paragraph: yes, I'm not saying that waitUntilFinish should 
trigger the termination - I'm just saying that such a pipeline must terminate 
regardless of whether it is a "streaming runner" or not (and this is currently 
not the case, e.g., for the Dataflow runner); so if somebody calls 
waitUntilFinish, this call should wait for that termination and complete as 
well. I guess, though, at this point we're discussing "when should pipelines 
terminate" rather than "what should the API be for detecting that".

In the example I gave, the "end" is not necessarily known ahead of execution. 
E.g. imagine a use case where we continually tail the file and stream data from 
it until the file is marked with a read-only attribute. It might get marked 
soon, tomorrow, or never at all - then we should keep running and processing 
new records as they arrive; but when it's marked read-only, the pipeline should 
terminate.

Unbounded collections are part of the SDK, but unbounded pipelines are not. I 
guess one could introduce terminology that an unbounded pipeline is a pipeline 
that has at least one unbounded collection?... but again, it seems like the 
only use case for that would be when a runner that only supports bounded 
collections validates whether a pipeline being submitted satisfies this.

Ideally I would like to stop using terminology such as "batch" and "streaming" 
altogether, except when referring to a particular runner ("batch Dataflow 
runner", "streaming Spark runner").

> Redesign PipelineResult API
> ---------------------------
>
>                 Key: BEAM-849
>                 URL: https://issues.apache.org/jira/browse/BEAM-849
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to