[ 
https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959390#comment-15959390
 ] 

Kenneth Knowles commented on BEAM-1723:
---------------------------------------

The caches do need to be fault-tolerant or you'll get dupes.

It is simplest to have no configuration, but hard to say. I think there could 
be some discussion here. The deduplication window is really about the potential 
for re-delivery of a message, not like allowed lateness at all.

For example, in {{PubsubIO}} duplicates occur when output is committed but 
{{finalizeCheckpoint}} does not succeed at ACKing all messages. Then Pubsub 
will redeliver the message.

> FlinkRunner should deduplicate when an UnboundedSource requires Deduping
> ------------------------------------------------------------------------
>
>                 Key: BEAM-1723
>                 URL: https://issues.apache.org/jira/browse/BEAM-1723
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Thomas Groh
>            Assignee: Jingsong Lee
>
> UnboundedSource implementations can require deduping, and the FlinkRunner 
> currently logs a warning that this is not supported.
> https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to