[
https://issues.apache.org/jira/browse/METRON-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671672#comment-15671672
]
ASF GitHub Bot commented on METRON-569:
---------------------------------------
Github user DomenicPuzio commented on the issue:
https://github.com/apache/incubator-metron/pull/359
@cestella, I made the change to the PR name; thanks for the tip there.
I completely agree that we don't want to miss out on catching failures due
to an enrichment source (like MySQL) or in the enrichment infrastructure.
However, after we have put a tuple into the JoinBolt's cache, isn't it already
past the point where these pitfalls could occur? If there is a failure in an
enrichment, then Storm will time out while waiting for an ack that never takes
place, so this message will be re-sent; but if the message is already in the
cache, isn't its journey complete?
My thought with placing the ack there was (1) at this stage of the
topology, that tuple has been correctly processed, and (2) for simplicity's
sake so that we wouldn't have to modify `streamMessageMap`. I do agree that
acking everything after the join has taken place also makes a lot of sense, and
I can work on that if you would like.
We saw the duplicate data while running this in our development environment
in EC2. Perhaps this is due to different ack timeout settings in the Storm
topologies? We've repeated the duplication of data many times on our end.
> Enrichment topology duplicates messages
> ---------------------------------------
>
> Key: METRON-569
> URL: https://issues.apache.org/jira/browse/METRON-569
> Project: Metron
> Issue Type: Bug
> Reporter: Domenic Puzio
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When running the 'enrichment' topology, I get duplicate message being
> indexed. For example, I put 100 messages into the 'enrichment' Kafka queue
> and I get 175 messages onto the 'indexing' Kafka queue. This happens when I
> am running the 'enrichment' topology with one or more enrichment bolt.
> This is an acking issue within the JoinBolt class. When a message does not
> "complete" the join (like when it is the first message in a pair of message
> to get joined) it does not get acked. This means that this message will get
> replayed through Storm, causing message duplication further down the road and
> tons of additional overhead. Adding the correct acking resolves this problem.
> I will add the PR for this shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)