[
https://issues.apache.org/jira/browse/STORM-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155843#comment-14155843
]
ASF GitHub Bot commented on STORM-495:
--------------------------------------
Github user ptgoetz commented on a diff in the pull request:
https://github.com/apache/storm/pull/254#discussion_r18316629
--- Diff: external/storm-kafka/src/jvm/storm/kafka/SpoutConfig.java ---
@@ -26,8 +26,19 @@
public Integer zkPort = null;
public String zkRoot = null;
public String id = null;
+
+ // setting for how often to save the current kafka offset to ZooKeeper
public long stateUpdateIntervalMs = 2000;
+ // Exponential back-off retry settings. These are used when retrying
messages after a bolt
+ // calls OutputCollector.fail().
+ //
+ // Note: be sure to set backtype.storm.Config.MESSAGE_TIMEOUT_SECS
appropriately to prevent
+ // resubmitting the message while still retrying.
--- End diff --
Can we do some sanity checking in the `KafkaSpout.open()` method? At that
point you should be able to compare both the `SpoutConfig` and the
`backtype.storm.Config` to see if `MESSAGE_TIMEOUT_SECS` is set appropriately.
If it is not, we should throw and `IllegalStateException` (with a good
message explaining the problem) or at the very least log (loud) errors.
My (and I suspect @d2r's) fear is that if there is a misconfiguration, it
could be very difficult for a user to debug.
I would also make the comments javadoc comments.
> Add delayed retries to KafkaSpout
> ---------------------------------
>
> Key: STORM-495
> URL: https://issues.apache.org/jira/browse/STORM-495
> Project: Apache Storm
> Issue Type: Improvement
> Affects Versions: 0.9.3
> Environment: all environments
> Reporter: Rick Kilgore
> Priority: Minor
> Labels: kafka, retry
>
> If a tuple in the topology originates from the KafkaSpout from the
> external/storm-kafka sources, and if a bolt in the topology indicates a
> failure by calling fail() on its OutputCollector, the KafkaSpout will
> immediately retry the message.
> We wish to use this failure and retry behavior in our ingestion system
> whenever we experience a recoverable error from a downstream system, such as
> a 500 or 503 error from a service we depend on. But with the current
> KafkaSpout behavior, doing so results in a tight loop where we retry several
> times over a few seconds and then give up. I want to be able to delay retry
> to give the downstream service some time to recover. Ideally, I would like
> to have configurable, exponential backoff retry.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)