[ 
https://issues.apache.org/jira/browse/KUDU-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1410.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 0.9.0

Going to call this done for now, though may make some follow-on improvements.

What got implemented:
- Traces now keep a map of counters which can be incremented with 
TRACE_COUNTER_INCREMENT
-- various code paths like block cache and log block manager are now 
instrumented, as well as generic code like threadpools for wait times, 
mutexes/spinlocks for contention, etc.
- RPCs longer than a second get their metrics logged (this might prove too 
noisy in which case we'll drop it)
- For each RPC method we keep a sample (no more than once per second) in 
several buckets (0-10ms, 10-100, 100-1000ms, >1sec) which are displayed on /rpcz

Some potential followup areas:
- keep more than one RPC sample per bucket, not too expensive if we're just 
keeping the counters around and can make it relatively easy to do correlation 
of perf issues
- allow propagating them back to the caller (useful eg for impala to show these 
stats in a query profile)
- adaptive latency buckets
- more instrumentation




> Improve diagnosability of performance problems
> ----------------------------------------------
>
>                 Key: KUDU-1410
>                 URL: https://issues.apache.org/jira/browse/KUDU-1410
>             Project: Kudu
>          Issue Type: Bug
>          Components: supportability
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.9.0
>
>
> Although Kudu has been relatively stable for most users, we are starting to 
> see more and more questions about performance. In internal test clusters 
> we're also struggling to understand performance issues or timeouts in some 
> cases from logs only, and it can require gathering a daemon trace to see 
> what's going on.
> This is an umbrella ticket for various improvements we can make so that 
> performance is easier to understand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to