Github user roshannaik commented on the issue:
https://github.com/apache/storm/pull/1693
@arunmahadevan
Its ok to keep retry-ing errors that are considered retry-worthy. The
non-retry worthy are the ones we must fail fast and move them out as they will
cause a jam and prevent good tuples from flowing. Also no good way to recover
from that.
One approach to do this...
For **non-retry worthy** error conditions (like bad data), any processing
element in the pipeline can throw a specific exception. This can then be
handled by the runtime to send to a configurable dead letter queue. Ideally
this needs a new kind of Fail() notification in the spout to avoid re-emit.
Alternatively, we can send the spout an ACK instead of FAIL to avoid retry. The
DeadLetterQ bolt's metrics will capture these failure metrics. Best to no have
each spout/bolt explicitly deal with dead letter queue ... that will complicate
the topology definition...as every spout bolt will need to be configured and
wired up to do this.
For **retry-worthy errors** (timeouts, destination unavailable, etc)...
perhaps no special treatment is required as the existing retry mechanism will
kick in.
**However** in today's core API, there is one pain point for spout writers.
Each spout needs to implement the logic to track inflight tuples and attempt
retry on fail(). The implementation is moderately complicated as ACKs/Fails can
come in any order. All the spouts have to do the same thing but end up doing
slightly differently. Some how retry limits, some dont. This retry logic should
ideally be lifted out of the Spout and handled in the API. This new API is a
good opportunity to fix this issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---