[
https://issues.apache.org/jira/browse/CASSANDRA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438475#comment-13438475
]
Jonathan Ellis commented on CASSANDRA-1123:
-------------------------------------------
Dug into this pretty hard today. Thanks for the huge effort, David!
The good news is I think we can simplify this a great deal. The bad news is
that it's a lot of code churn. Sorry about that! We probably could have used
a little more back-and-forth before fleshing it out so completely.
What I saw as I tried to add trace points was that the TraceEvent/Builder api
was fairly cumbersome. What I really wanted to do was instrument all the debug
logging that we've accumulated, most of which has the battle scars of being
added to solve a particular tricky situation.
So I started banging on the API and realized that I was ending up with
something that resembled a logging appender. So why not hook into our existing
logging api? Like Aaron's original idea, but without any copy and pasting; we
can just implement a log4j LoggingAppender and life is good.
This does mean we don't have payload maps anywhere but in the session
initialization, which gets special-cased, but I'm fine with that.
I've pushed a sketch of this approach to
https://github.com/jbellis/cassandra/tree/1123-4. I can't devote another day
to finish it, so I'm going to have to throw it back to David, but I've
considerately left the parts that need work in a non-compiling state so it's
clear where to start. :)
(The main drawback is we have to be careful about where we log on the write
path to avoid an infinite loop, but this was a problem with the old approach as
well. At least infinite loops will be fairly obvious and easy to fix.)
Some other points:
- I'm violently opposed to serializing thrift objects into the trace. This
just pushes the job of making them human-readable out to each consumer. Let's
solve this once at the server level instead. One approach is given (but not
completed) in my tree. Note that I'm okay with being *barely* human-readable;
the main use case we're concerned with is tracing queries interactively, in
which case we already know the parameters and the logging is a formality. I do
want it to be *possible* to reconstruct a problematic query detected by
probabilistic sampling, but it doesn't have to be easy. (That said, turning
hex back into cli or cqlsh is at least possible without writing a Thrift
deserializer first, so it has that much of an advantage over the initial patch.)
- I'm fine with giving up some structure in exchange for ease of use. A single
"activity" column, with some metadata, is enough.
- I'm also fine with giving up any or all of the pretty printer, summary by
request type, and tests, which I've made no effort to port. Would strongly
prefer getting a bare bones implementation finished, then adding more
functionality later. (I do note that the pretty printer probably makes more
sense to grow vertically rather than horizontally.)
- enabling tracing-by-probability should be in jmx rather than thrift. unsure
if max-sessions-to-trace is useful.
- I think you will like the improvements around DTPE and the Stage.
- Wiring up appenders at different levels (we'd want the file appender at INFO,
the tracing one at DEBUG) is kind of a bitch. Apologies in advance. Possibly
useful:
http://stackoverflow.com/questions/2154539/log4j-log-level-per-appender-for-a-single-logger,
http://stackoverflow.com/questions/751431/how-can-i-direct-log4j-output-so-that-different-log-levels-go-to-different-appen
Think that about covers it, but it's late and I could have missed something.
> Allow tracing query details
> ---------------------------
>
> Key: CASSANDRA-1123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1123
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: David Alves
> Fix For: 1.2.0
>
> Attachments: 1123-3.patch.gz, 1123.patch, 1123.patch
>
>
> In the spirit of CASSANDRA-511, it would be useful to tracing on queries to
> see where latency is coming from: how long did row cache lookup take? key
> search in the index? merging the data from the sstables? etc.
> The main difference vs setting debug logging is that debug logging is too big
> of a hammer; by turning on the flood of logging for everyone, you actually
> distort the information you're looking for. This would be something you
> could set per-query (or more likely per connection).
> We don't need to be as sophisticated as the techniques discussed in the
> following papers but they are interesting reading:
> http://research.google.com/pubs/pub36356.html
> http://www.usenix.org/events/osdi04/tech/full_papers/barham/barham_html/
> http://www.usenix.org/event/nsdi07/tech/fonseca.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira