Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Venkat Gmail Sat, 12 Sep 2015 09:52:23 -0700

Here are bunch of solutions: 
Don't drop it but ack it always. 

1) try use exactly once topology like Trident. 
2) if you want to use Regular storm topology which guarantees at least once 
(>=1): 
A) override Kafka spout and handle fail differently
Or 
B) at your bolt, 
Just store unique message id when you put in Kafka in to a store. These message 
ids can be stored in redis store. If message id already exists ( processed 
already), then ack it. Otherwise process it and ack it. In this way you will 
achieve 100% accuracy with regular storm topology without trident. For the 
message id: you can use already existing Kafka message id or create a unique 
guid for each message prior to putting in Kafka at the first time and use the 
same id. 
Redis store: maintain a small foot print redis store to maintain a  ttl of 4 
hours for each message id. I prefer 2nd one.


Let me know if you have any questions and I will be glad to assist. 
Thanks,
Venkat

> On Sep 12, 2015, at 7:56 AM, Nathan Leung <[email protected]> wrote:
> 
> Don't fail the tuple, just drop it (don't emit). Btw the user list is
> better for this type of question.
> On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" <[email protected]>
> wrote:
> 
>> Hi,
>> 
>> As per my knowledge  storm is follow "at least one” way , which means it
>> will make sure at least once tuple gets fully processed. So my question is,
>> if I have received some unexpected data, certain bolt in my topology will
>> start them failing. The spout will get the failure notification from acker
>> thread and will resend them. However as I know its always going to fail, is
>> there any way I can ask spout to stop generation of spout after X number of
>> attempts?
>> 
>> Thanks,
>> Sachin
>>

Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Reply via email to