[
https://issues.apache.org/jira/browse/KUDU-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon resolved KUDU-1410.
-------------------------------
Resolution: Fixed
Fix Version/s: 0.9.0
Going to call this done for now, though may make some follow-on improvements.
What got implemented:
- Traces now keep a map of counters which can be incremented with
TRACE_COUNTER_INCREMENT
-- various code paths like block cache and log block manager are now
instrumented, as well as generic code like threadpools for wait times,
mutexes/spinlocks for contention, etc.
- RPCs longer than a second get their metrics logged (this might prove too
noisy in which case we'll drop it)
- For each RPC method we keep a sample (no more than once per second) in
several buckets (0-10ms, 10-100, 100-1000ms, >1sec) which are displayed on /rpcz
Some potential followup areas:
- keep more than one RPC sample per bucket, not too expensive if we're just
keeping the counters around and can make it relatively easy to do correlation
of perf issues
- allow propagating them back to the caller (useful eg for impala to show these
stats in a query profile)
- adaptive latency buckets
- more instrumentation
> Improve diagnosability of performance problems
> ----------------------------------------------
>
> Key: KUDU-1410
> URL: https://issues.apache.org/jira/browse/KUDU-1410
> Project: Kudu
> Issue Type: Bug
> Components: supportability
> Affects Versions: 0.8.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.9.0
>
>
> Although Kudu has been relatively stable for most users, we are starting to
> see more and more questions about performance. In internal test clusters
> we're also struggling to understand performance issues or timeouts in some
> cases from logs only, and it can require gathering a daemon trace to see
> what's going on.
> This is an umbrella ticket for various improvements we can make so that
> performance is easier to understand.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)