[
https://issues.apache.org/jira/browse/STORM-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Kellogg updated STORM-135:
-------------------------------
Component/s: storm-core
> Tracer bullets
> --------------
>
> Key: STORM-135
> URL: https://issues.apache.org/jira/browse/STORM-135
> Project: Apache Storm
> Issue Type: New Feature
> Components: storm-core
> Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/146
> Debugging the flow of tuples through a Storm topology can be pretty tedious.
> One might have to do lots of logging and watch many log files, or do other
> kinds of instrumentation. It would be great to include a system to select
> certain tuples for tracing, and track the progress of those tuples through
> the topology.
> Here is a use case:
> Suppose one were to do stats aggregation using Storm. Some things I might
> want to ensure are:
> Is the aggregation and flush happening in a timely way?
> Are there hotspots?
> Are there unexpected latencies? Are some bolts taking a long time?
> To answer the above questions, I might select a random sample of tuples, or
> maybe a random sample of a specific subset of tuples. The tuples to be traced
> could be tagged with a special attribute.
> I would want to track the following events:
> Spout emit - send (task id, spout name, timestamp)
> For each bolt:
> When a traced tuple arrives and execute() is called: (task id, bolt name,
> timestamp)
> When a tuple is emitted that is anchored on the tuple that arrived: (task id,
> bolt name, timestamp)
> Here is what I can do with the data from above (assuming one can correlate
> tuples emitted with incoming tuples, based on the anchor):
> For the aggregation bolt, I can look at the distribution of (emit timestamp -
> incoming timestamp) and see if it makes sense.
> I can graph the life of one tuple, look at spout/bolt vs timestamp graph, and
> visually see how much time is being spent in each bolt, as well as how much
> time is spent in the Storm infrastructure / ZMQ.
> This data can be overlayed for multiple tuples to get a sense of the timing
> distribution for the topology.
> Using the task ID information, one can do a cool overlay graph that traces
> the distribution of a number of tuples over a topology. One can use that to
> see if field groupings are working, are unevenly distributed, etc.
> For now I may start implementing this idea in scala-storm DSL.
> ----------
> tdunning: I actually think that, if possible, unanchored tuples should also
> be traced.
> One simple implementation would be to add some information to each tuple to
> indicate the tracing status of the tuple.
> As each tuple arrives, the tracing status would be inspected. If set, a
> tracing wrapper for the collector would be used in place of the actual
> collector for that tuple. This would make tracing of all resulting tuples
> possible, not just the anchored tuples.
> It would also be very useful to have a traceMessage() method on the collector
> that could be used by the bolt to record a trace message if tracing is on.
> It would also make sense to have a method that turns tracing on or off for a
> collector. This might need to return a new tracing collector in order to
> allow collectors to be immutable.
> The tracing information that I see would be useful includes:
> a) possibly a trace level similar to the logging level used by log4j and
> other logging packages
> b) a trace id so that multiple traces can be simultaneously active. This
> could be generated when tracing is turned on. It would be nice to have a
> provision to provide an external id that could be correlated to outside
> entities like a user-id.
> ----------
> velvia: +1 for adding tracing level to the tuple metadata.
> Nathan or others:
> I think this ticket should be split up into a couple parts:
> 1) A generic callback or hook mechanism for when tuples are emitted and when
> they arrive via execute() in bolts.
> 2) A specific callback for filtering and implementing tracer bullets
> 3) Additional metadata in the Tuple class to track tracing, and changes to
> allow it to be serialized
> Should this be split up into multiple issues?
> Also pointers to where in the code the three could be implemented would be
> awesome.
> Thanks!
> Evan
> ----------
> tdunning: With JIRA, sub-tasks would be a great idea. With Github's very
> basic issue tracker, probably not so much.
> ----------
> nathanmarz: FYI, I've added hooks into Storm for 0.7.1
> ----------
> tdunning: Can you provide a pointer or three to where the hooks are?
> ----------
> nathanmarz: I explained it here: #153 (comment)
> I'll have a wiki page about hooks once the feature is released.
> ----------
> mrflip: @thedatachef has implemented this. We'd like guidance on the
> implementation choices made; you'll see the pull request shortly.
> We targeted Trident, not Storm. It's our primary use case, and we want to see
> values at each operation boundary (not each bolter); meanwhile hooks seem to
> give good-enough support for Storm.
> Trident Tuples have methods to set, unset and test if the tuple is traceable.
> They become labeled as traceable with an assembly, which you can put in
> anywhere in the topology. We have have one such that makes every nth tuple
> traceable.
> All descendants of a traceable tuple are traceable. The framework doesn't
> ever unlabel things, even if a tuple is prolific -- it's easy enough to thin
> the herd with an assembly.
> When the collector emits a tuple, if the tuple is traceable it
> anoints the new tuple as traceable
> records the current step in the trace history -- a tracer bullet carries the
> history of every stage it's passed through.
> writes an illustration of the trace history to the progress log. Since only a
> fraction of tuples are expected to be traceable, we feel efficiency is less
> important that this be structured, verbose and readable.
> We don't do anything to preserve traceability across an aggregation, mostly
> because we don't know what to uniformly do in that case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)