Here are bunch of solutions: Don't drop it but ack it always. 1) try use exactly once topology like Trident. 2) if you want to use Regular storm topology which guarantees at least once (>=1): A) override Kafka spout and handle fail differently Or B) at your bolt, Just store unique message id when you put in Kafka in to a store. These message ids can be stored in redis store. If message id already exists ( processed already), then ack it. Otherwise process it and ack it. In this way you will achieve 100% accuracy with regular storm topology without trident. For the message id: you can use already existing Kafka message id or create a unique guid for each message prior to putting in Kafka at the first time and use the same id. Redis store: maintain a small foot print redis store to maintain a ttl of 4 hours for each message id. I prefer 2nd one.
Let me know if you have any questions and I will be glad to assist. Thanks, Venkat > On Sep 12, 2015, at 7:56 AM, Nathan Leung <[email protected]> wrote: > > Don't fail the tuple, just drop it (don't emit). Btw the user list is > better for this type of question. > On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" <[email protected]> > wrote: > >> Hi, >> >> As per my knowledge storm is follow "at least one” way , which means it >> will make sure at least once tuple gets fully processed. So my question is, >> if I have received some unexpected data, certain bolt in my topology will >> start them failing. The spout will get the failure notification from acker >> thread and will resend them. However as I know its always going to fail, is >> there any way I can ask spout to stop generation of spout after X number of >> attempts? >> >> Thanks, >> Sachin >>
