[
https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959390#comment-15959390
]
Kenneth Knowles commented on BEAM-1723:
---------------------------------------
The caches do need to be fault-tolerant or you'll get dupes.
It is simplest to have no configuration, but hard to say. I think there could
be some discussion here. The deduplication window is really about the potential
for re-delivery of a message, not like allowed lateness at all.
For example, in {{PubsubIO}} duplicates occur when output is committed but
{{finalizeCheckpoint}} does not succeed at ACKing all messages. Then Pubsub
will redeliver the message.
> FlinkRunner should deduplicate when an UnboundedSource requires Deduping
> ------------------------------------------------------------------------
>
> Key: BEAM-1723
> URL: https://issues.apache.org/jira/browse/BEAM-1723
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Thomas Groh
> Assignee: Jingsong Lee
>
> UnboundedSource implementations can require deduping, and the FlinkRunner
> currently logs a warning that this is not supported.
> https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)