Issue with storm-kafka-client in replaying tuples

Punit Tiwan Fri, 03 Feb 2017 11:48:45 -0800

Hi,

I am using jar of storm-kafka-client by building the package since i want 
support of kafka 0.10.1.1. I am testing how my system will behave if one of the 
worker gets killed while my topology is running.


Configuration used:
topology where we have storm-kafka-client as spout and another bolt to process 
those messages.
Topology runs on two workers(say W1 and W2). Spout is running on W1 and bolt is 
running on W2.
Tuple timeout is 30secs.

For testing purpose, Storm-kafka-client's max_uncommitted_offset has been set 
to 40 and max_spout_pending in storm has been set to 20.

As we start topology, spout starts reading messages from kafka and forward it 
to bolt. As my bolt start processing the tuple, i killed W2 purposely to 
simulate crash. As W2 restarts, spout keeps reading messages from kafka and 
emit to storm. So when W2 comes up, there are certain messages that got 
lost(E.g. from tuple 2-5 got lost). Now as next tuples (6 onwards) reach to 
bolt, bolt acknowledge them.
Before we get a failure of lost tuples, spout reaches its 
max_uncommitted_offset and keep printing log that it reaches maximum 
uncommitted limit and won’t read any new message from kafka.
Now even if it gets a failure of tuple from storm, it never replay them again. 
Also spout won’t read new message from kafka.
This is a big issue for us because my topology is stuck till i restart it.

Another issue is that although i kept max_spout_pending to half of the value of 
max_uncommitted_offset, storm-kafka-client got records of number equal to  
max_uncommitted_offset instead of max_spout_pending. 

Can anyone please help me with this.

PS: While debugging we found that if number of uncommitted offset becomes equal 
or greater to max uncommitted offset then spout stops polling kafka and in poll 
method only it ask retry service whether there are any messages to replay by 
using method “doSeekRetriableTopicPartitions”.

Regards,
Punit Tiwan

Issue with storm-kafka-client in replaying tuples

Reply via email to