[
https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959226#comment-15959226
]
Jingsong Lee commented on BEAM-1723:
------------------------------------
I see {{CachedIdDeduplicator}} in direct runner. It use {{LoadingCache}} to
dedup. The expireAfterAccess is 10 minutes and the maximumSize is 100_000. Do
these two values need to be parameterized?
Do these caches need be snapshotted in flink runner? (Fault tolerance)
> FlinkRunner should deduplicate when an UnboundedSource requires Deduping
> ------------------------------------------------------------------------
>
> Key: BEAM-1723
> URL: https://issues.apache.org/jira/browse/BEAM-1723
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Thomas Groh
>
> UnboundedSource implementations can require deduping, and the FlinkRunner
> currently logs a warning that this is not supported.
> https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)