[jira] [Updated] (STORM-154) Provide more information to spout "fail" method

Rick Kellogg (JIRA) Thu, 08 Oct 2015 17:14:41 -0700

     [ 
https://issues.apache.org/jira/browse/STORM-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rick Kellogg updated STORM-154:
-------------------------------
    Component/s: storm-core

> Provide more information to spout "fail" method
> -----------------------------------------------
>
>                 Key: STORM-154
>                 URL: https://issues.apache.org/jira/browse/STORM-154
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/39
> It might be helpful to distinguish between unexpected errors (when they can 
> be caught) and timeouts.
> ----------
> conflagrator: +1 on this. I wrote a class extending OutputCollector with the 
> following wrapper functions:
> public class VerboseOutputCollector extends OutputCollector {
>     public void fail(Tuple tuple) {}
>     public void fail(Tuple tuple, String message) {}
>     public void fail(Tuple tuple, Exception e) {}
>     public void fail(Tuple tuple, Exception e, String message) {}
> }
> Each function generates an output containing the class and the line number of 
> the "fail" call and the message or Exception, if provided. It's very handy 
> for log analytic.
> ----------
> dmoore247: +1
> With 0.8.1 on a local cluster I've spent many hours tracking down failures, 
> going through executor.clj code, turning on full logging, adding TaskHooks, 
> playing with time out parameters, adding exception handling etc. 
> As an aside, the SpoutFail....latencyMs value was always a null in my tests 
> on the LocalCluster.
> Still, all I know is that the message failed, but not why (Timeout?). 
> Based on playing with the timeout parameters, I deduce that the failures were 
> caused by timeouts.
> Where in Storm does it determine, hey, we've exceeded a timeout, let's fail 
> this Tuple? At least we/I could add debug message to Storm.
> Many thanks.
> ----------
> ruleb: +1
> Had the same situation, searched a whole day to conclude that a trident 
> topology regularly dropped complete batches of tuples because of timeout 
> reached when they are queued up at a busy bolt. 
> Having a small "tuple timeout reached" in the logs @ info level will save 
> many developer days.
> Many thanks.
> ----------
> thecoop: This would be very helpful to determine why tuples are failing, 
> rather than just an arbitrary number in the UI - just putting something in 
> the logs as an info or warn saying a tuple failed and some information on why 
> it failed.
> ----------
> brianantonelli: +1
> Would be great to get more information about what caused the spout to fail. 
> I'm also seeing that the latency is always null too.
> ----------
> revans2: It is fairly simple to extend spout to indicate if a tuple failed 
> because of a timeout or if it failed because of something else, but it is 
> much harder to determine what that something else was. The fail API on all 
> output collectors does not have anything that could be used to map it to a 
> reason. We would have to extend the API and decide what the failure reason 
> should look like. Perhaps a free form string, but that is really horrible if 
> you want to aggregate the failures in metrics. Also we would want to limit 
> the size of the string so an to not overwhelm the acker bolts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-154) Provide more information to spout "fail" method

Reply via email to