[ 
https://issues.apache.org/jira/browse/METRON-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671850#comment-15671850
 ] 

Nick Allen commented on METRON-569:
-----------------------------------

It makes sense to me that this problem occurs when an enrichment takes longer 
than the storm timeout.  Bravo for teasing that out.

The join bolt has a cache containing the original messages that it is expecting 
enrichments for.  This cache is invalidated after some period of time.  If the 
cache invalidation time is less than the storm timeout, then in the scenario 
described, the 'late' enrichment message would reach the join bolt, but the 
join bolt would have already forgotten about the original message.  The join 
bolt would then correctly ignore the 'late' enrichment message.

To address the potential for misconfiguration, we could set the cache 
invalidation time to be some fraction of the storm timeout.



> Enrichment topology duplicates messages
> ---------------------------------------
>
>                 Key: METRON-569
>                 URL: https://issues.apache.org/jira/browse/METRON-569
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Domenic Puzio
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When running the 'enrichment' topology, I get duplicate message being 
> indexed. For example, I put 100 messages into the 'enrichment' Kafka queue 
> and I get 175 messages onto the 'indexing' Kafka queue. This happens when I 
> am running the 'enrichment' topology with one or more enrichment bolt.
> This is an acking issue within the JoinBolt class. When a message does not 
> "complete" the join (like when it is the first message in a pair of message 
> to get joined) it does not get acked. This means that this message will get 
> replayed through Storm, causing message duplication further down the road and 
> tons of additional overhead. Adding the correct acking resolves this problem.
> I will add the PR for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to