I want to gather some thoughts on a suggestion to provide a dead-letter
functionality common to all spouts/bolts.
Currently, if any spout / bolt reports a failure, it is retried by the
For a single bolt-failure in a large ADG, this retry logic can cause
several perfectly successful component to replay and yet the Tuple could
fail exactly at the same bolt on retry.
This is fine usually (if the failure was temporary, say due to a network
glitch) but sometimes, the message is bad enough such that it should not be
retried but at the same time important enough that its failure should not
Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
Most of the times, it is able to deserialize the bytes correctly but
sometimes a badly formatted message fails to deserialize. For such cases,
neither Kafka-Spout should retry nor ES-bolt should report a success. It
should however be reported to the user somehow that a badly serialized
message entered the stream.
For cases like temporary network glitch, the Tuple should be retried.
So the proposal is to implement a dead-letter topic as:
1) Add a new method *failWithoutRetry(Tuple, Exception)* in the collector.
Bolts will begin using it once its available for use.
2) Provide the ability to *configure a dead-letter data-store in the spout* for
failed messages reported by #1 above.
The configurable data-store should support kafka, solr and redis to
begin-with (Plus the option to implement one's own and dropping a jar file
in the classpath).
Such a feature should benefit all the spouts as:
1) Topologies will not block replaying the same doomed-to-fail tuples.
2) Users can set alerts on dead-letters and find out easily actual problems
in their topologies rather than analyze all failed tuples only to find that
they failed because of a temporary network glitch.
3) Since the entire Tuple is put into the dead-letter, all the data is
available for retrying after fixing the topology code.
Please share your thoughts if you think it can benefit storm in a generic