[jira] [Updated] (STORM-135) Tracer bullets

Rick Kellogg (JIRA) Thu, 08 Oct 2015 17:17:37 -0700

     [ 
https://issues.apache.org/jira/browse/STORM-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rick Kellogg updated STORM-135:
-------------------------------
    Component/s: storm-core

> Tracer bullets
> --------------
>
>                 Key: STORM-135
>                 URL: https://issues.apache.org/jira/browse/STORM-135
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/146
> Debugging the flow of tuples through a Storm topology can be pretty tedious. 
> One might have to do lots of logging and watch many log files, or do other 
> kinds of instrumentation. It would be great to include a system to select 
> certain tuples for tracing, and track the progress of those tuples through 
> the topology.
> Here is a use case:
> Suppose one were to do stats aggregation using Storm. Some things I might 
> want to ensure are:
> Is the aggregation and flush happening in a timely way?
> Are there hotspots?
> Are there unexpected latencies? Are some bolts taking a long time?
> To answer the above questions, I might select a random sample of tuples, or 
> maybe a random sample of a specific subset of tuples. The tuples to be traced 
> could be tagged with a special attribute.
> I would want to track the following events:
> Spout emit - send (task id, spout name, timestamp)
> For each bolt:
> When a traced tuple arrives and execute() is called: (task id, bolt name, 
> timestamp)
> When a tuple is emitted that is anchored on the tuple that arrived: (task id, 
> bolt name, timestamp)
> Here is what I can do with the data from above (assuming one can correlate 
> tuples emitted with incoming tuples, based on the anchor):
> For the aggregation bolt, I can look at the distribution of (emit timestamp - 
> incoming timestamp) and see if it makes sense.
> I can graph the life of one tuple, look at spout/bolt vs timestamp graph, and 
> visually see how much time is being spent in each bolt, as well as how much 
> time is spent in the Storm infrastructure / ZMQ.
> This data can be overlayed for multiple tuples to get a sense of the timing 
> distribution for the topology.
> Using the task ID information, one can do a cool overlay graph that traces 
> the distribution of a number of tuples over a topology. One can use that to 
> see if field groupings are working, are unevenly distributed, etc.
> For now I may start implementing this idea in scala-storm DSL.
> ----------
> tdunning: I actually think that, if possible, unanchored tuples should also 
> be traced.
> One simple implementation would be to add some information to each tuple to 
> indicate the tracing status of the tuple.
> As each tuple arrives, the tracing status would be inspected. If set, a 
> tracing wrapper for the collector would be used in place of the actual 
> collector for that tuple. This would make tracing of all resulting tuples 
> possible, not just the anchored tuples.
> It would also be very useful to have a traceMessage() method on the collector 
> that could be used by the bolt to record a trace message if tracing is on.
> It would also make sense to have a method that turns tracing on or off for a 
> collector. This might need to return a new tracing collector in order to 
> allow collectors to be immutable.
> The tracing information that I see would be useful includes:
> a) possibly a trace level similar to the logging level used by log4j and 
> other logging packages
> b) a trace id so that multiple traces can be simultaneously active. This 
> could be generated when tracing is turned on. It would be nice to have a 
> provision to provide an external id that could be correlated to outside 
> entities like a user-id.
> ----------
> velvia: +1 for adding tracing level to the tuple metadata.
> Nathan or others:
> I think this ticket should be split up into a couple parts:
> 1) A generic callback or hook mechanism for when tuples are emitted and when 
> they arrive via execute() in bolts.
> 2) A specific callback for filtering and implementing tracer bullets
> 3) Additional metadata in the Tuple class to track tracing, and changes to 
> allow it to be serialized
> Should this be split up into multiple issues?
> Also pointers to where in the code the three could be implemented would be 
> awesome.
> Thanks!
> Evan
> ----------
> tdunning: With JIRA, sub-tasks would be a great idea. With Github's very 
> basic issue tracker, probably not so much.
> ----------
> nathanmarz: FYI, I've added hooks into Storm for 0.7.1
> ----------
> tdunning: Can you provide a pointer or three to where the hooks are?
> ----------
> nathanmarz: I explained it here: #153 (comment)
> I'll have a wiki page about hooks once the feature is released.
> ----------
> mrflip: @thedatachef has implemented this. We'd like guidance on the 
> implementation choices made; you'll see the pull request shortly.
> We targeted Trident, not Storm. It's our primary use case, and we want to see 
> values at each operation boundary (not each bolter); meanwhile hooks seem to 
> give good-enough support for Storm.
> Trident Tuples have methods to set, unset and test if the tuple is traceable.
> They become labeled as traceable with an assembly, which you can put in 
> anywhere in the topology. We have have one such that makes every nth tuple 
> traceable.
> All descendants of a traceable tuple are traceable. The framework doesn't 
> ever unlabel things, even if a tuple is prolific -- it's easy enough to thin 
> the herd with an assembly.
> When the collector emits a tuple, if the tuple is traceable it
> anoints the new tuple as traceable
> records the current step in the trace history -- a tracer bullet carries the 
> history of every stage it's passed through.
> writes an illustration of the trace history to the progress log. Since only a 
> fraction of tuples are expected to be traceable, we feel efficiency is less 
> important that this be structured, verbose and readable.
> We don't do anything to preserve traceability across an aggregation, mostly 
> because we don't know what to uniformly do in that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-135) Tracer bullets

Reply via email to