[
https://issues.apache.org/jira/browse/METRON-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671721#comment-15671721
]
ASF GitHub Bot commented on METRON-569:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/359
It occurs to me that what may be happening is that an enrichment may be
taking longer than the timeout that storm is using to wait on that ack. If
that is the case, I could see duplicated data.
Imagine the following situation, with the storm timeout `x` and an
enrichment taking `x + 1`. The enrichment would finish and send the enriched
data from the enrichment adapter to the join bolt but storm would've already
triggered a replay. The enrichment completing would have triggered the join to
happen and the joined message to be emitted and the replay would trigger
another copy of the message.
In this case, I'd suggest ensuring that either your enrichment is capped at
maximum
http://storm.apache.org/releases/current/javadocs/org/apache/storm/Config.html#TOPOLOGY_MESSAGE_TIMEOUT_SECS
or adjusting the message timeout in storm to be higher than this setting in
storm.
> Enrichment topology duplicates messages
> ---------------------------------------
>
> Key: METRON-569
> URL: https://issues.apache.org/jira/browse/METRON-569
> Project: Metron
> Issue Type: Bug
> Reporter: Domenic Puzio
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When running the 'enrichment' topology, I get duplicate message being
> indexed. For example, I put 100 messages into the 'enrichment' Kafka queue
> and I get 175 messages onto the 'indexing' Kafka queue. This happens when I
> am running the 'enrichment' topology with one or more enrichment bolt.
> This is an acking issue within the JoinBolt class. When a message does not
> "complete" the join (like when it is the first message in a pair of message
> to get joined) it does not get acked. This means that this message will get
> replayed through Storm, causing message duplication further down the road and
> tons of additional overhead. Adding the correct acking resolves this problem.
> I will add the PR for this shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)