[
https://issues.apache.org/jira/browse/BEAM-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863473#comment-16863473
]
Luke Cwik edited comment on BEAM-6902 at 6/13/19 9:25 PM:
----------------------------------------------------------
https://s.apache.org/beam-finalizing-bundles describes what finalization means
over portability but same concepts apply.
Checkpoints / finalization always happens after the output has been committed
durably in the Beam model. The finalization always occurs within the process
that requested the finalization and hence if that process crashes the
finalization may never happen which usually leads to duplication of data
produced by the source. Also, Runners are supposed to finalize as soon as
reasonably possible but there are no strict time guarantees.
If you would like, feel free to clarify the documentation you pointed to and
send a PR.
was (Author: lcwik):
https://s.apache.org/beam-finalizing-bundles describes what finalization means
over portability but same concepts apply.
Checkpoints / finalization always happens after the output has been committed
durably in the Beam model. The finalization always occurs within the process
that requested the finalization and hence if that process crashes the
finalization may never happen which usually leads to duplication of data
produced by the source. Also, Runners are supposed to finalize as soon as
reasonably possible but there are no strict time guarantees.
> Beam model contract for finalization of CheckpointMark's
> ------------------------------------------------------------
>
> Key: BEAM-6902
> URL: https://issues.apache.org/jira/browse/BEAM-6902
> Project: Beam
> Issue Type: Improvement
> Components: beam-model, io-java-kafka, runner-core, runner-dataflow,
> sdk-java-core
> Reporter: Mark Norkin
> Priority: Major
> Labels: documentation
> Attachments: BEAM-6902 diagrams.pdf,
> beam-kafka-io-commit-model-examples-master.zip
>
>
> Question: What is the contract in Beam model for when checkpoint marks shall
> be finalized, is there any ?
> I'm working on pipeline that reads messages from Kafka using KafkaIO, and I'm
> looking at _commitOffsetsInFinalize()_ option, and KafkaCheckpointMark class.
> I want to achieve at-least-once message delivery semantics and want to be
> sure that offsets committed to Kafka after they are written to some sink.
> Looking at interface of
> [CheckpointMark|https://beam.apache.org/releases/javadoc/2.9.0/org/apache/beam/sdk/io/UnboundedSource.CheckpointMark.html]
> it's not clear when finalization shall be expected to happen.
> Is it runner dependent, what to expect when executing on _DataflowRunner_ ?
> And reading KafkaIO.Read javadoc on _commitOffsetsInFinalize_
> _[https://beam.apache.org/releases/javadoc/2.9.0/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#commitOffsetsInFinalize--]_
>
> also doesn't bring clarity to my understanding, particularly the phrase
> {quote}But it does not provide *_hard processing guarantees_*
> {quote}
> What exactly are hard processing guarantees ?
> Can I ask, please for documentation improvement in respect of
> _CheckpointMark_ and _commitOffsetsInFinalize_.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)