Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/2794
to look at the new patch set (#2).
Change subject: KUDU-1410 (part 1). Add per-Trace counters
......................................................................
KUDU-1410 (part 1). Add per-Trace counters
This adds a map of counters to each Trace object and provides a nice macro to
increment a given counter on the current trace.
The goal here is to make it easy to collect per-request metrics without plumbing
structs through the call stack or new sets of thread-locals. This mechanism
also makes it easy to hook up generic pieces of code such as threadpools and
locking primitives where plumbing a metrics structure through the call stack
is quite impractical.
To keep things fast, the map is keyed by const char* pointers which use pointer
comparison rather than string comparison - this avoids having to do any hashing
on the increment path. We could probably improve this further (eg using a
different more-optimized map type) if we start seeing it on a profile.
One wrinkle about using 'const char*' pointers is that it means the keys have
to live forever. The class provides a 'string interning' implementation which
allows strings generated at runtime to be used. This is used in particular for
threadpool-specific counters, which we can bound as a relatively small set.
I also sprinkled various counters around the code that I thought might be
useful, including:
- counters for spinlock and mutex wait times
- counters for cpu-time, wall-time, and queue-time for each threadpool
- counters for disk reads, so it's easy to see if a request was
blocked on seeky disk accesses
- counters for various pieces of the tablet code which have been
responsible for slow RPCs in the past
The RPC layer is augmented slightly to log an INFO level message for
any request which takes longer than a second. This should be conservative
enough that we don't get a lot of spew, but still interesting to look
at when we have slowness.
A typical trace looks something like:
I0411 23:47:48.600158 23021 inbound_call.cc:230] Call
kudu.tserver.TabletServerService.Write from 127.0.0.1:41490 (request call id
2029711) took 1125ms.
Request Metrics:
{ "child_traces" : [ { "apply.queue_time_us" : 390,
"cfile_cache_hit" : 5,
"cfile_cache_hit_bytes" : 20510,
"num_ops" : 1,
"prepare.queue_time_us" : 8,
"prepare.run_cpu_time_us" : 20,
"prepare.run_wall_time_us" : 29
} ] }
(pretty-printed using jsonlint -f -- the actual log is compact)
Change-Id: I86001778976f01b62342f63f98b62962bc74212f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/rpc/inbound_call.cc
M src/kudu/rpc/negotiation.cc
M src/kudu/rpc/rpc_context.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/mutex.cc
M src/kudu/util/spinlock_profiling-test.cc
M src/kudu/util/spinlock_profiling.cc
M src/kudu/util/stopwatch.h
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
M src/kudu/util/trace-test.cc
M src/kudu/util/trace.cc
M src/kudu/util/trace.h
A src/kudu/util/trace_metrics.cc
A src/kudu/util/trace_metrics.h
25 files changed, 349 insertions(+), 36 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/2794/2
--
To view, visit http://gerrit.cloudera.org:8080/2794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86001778976f01b62342f63f98b62962bc74212f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>