[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool

Chris Lohfink (JIRA) Wed, 19 Nov 2014 21:03:52 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219017#comment-14219017
 ]


Chris Lohfink commented on CASSANDRA-8341:
------------------------------------------

as reference point, I did a [simplistic 
benchmark|https://gist.github.com/clohfink/9ee94e3767b5d8170220] (on a 2014 
MBP) with JMH of a few scenarios from patch v1:

{code}
Benchmark        Mode  Samples      Score       Error   Units
baseline         avgt       15   8300.492 ±    92.987   ns/op
staticWrap       avgt       15   8438.268 ±   132.177   ns/op
threadlocal      avgt       15   8464.201 ±   161.554   ns/op
wrapped          avgt       15   8424.011 ±   134.407   ns/op
baseline       sample   253749   8197.106 ±    16.157   ns/op
staticWrap     sample   251910   8233.368 ±    14.841   ns/op
threadlocal    sample   244075   8540.737 ±   106.365   ns/op
wrapped        sample   247083   8443.597 ±    99.892   ns/op
baseline           ss       15  49466.667 ±  8039.477      ns
staticWrap         ss       15  44466.667 ±  7697.095      ns
threadlocal        ss       15  52266.667 ± 11679.329      ns
wrapped            ss       15  60000.000 ± 24000.352      ns
{code}

As a v2 I removed the meter, and switched to using currentmiilis instead of 
nanotime (~1/3 speed).  I switched the normal ThreadPool one to just do 
everything in overriden execute.  It wasn't as simple for the SEPExecutor.  
Since I didn't want to break the separation between the JMX SEP and the parent 
SEP by collecting metrics in the SEPWorker so I think keeping the threadlocal 
approach with a before/after execute callback is a little cleaner (match the 
ThreadExecutor api).  With [those 
changes|https://gist.github.com/clohfink/b51eb027c55008377d93]:

{code}
Benchmark        Mode  Samples       Score        Error   Units
baseline         avgt       15    8052.932 ±     92.517   ns/op
staticWrap       avgt       15    8313.957 ±    295.964   ns/op
threadlocal      avgt       15    8553.304 ±    189.656   ns/op
wrapped          avgt       15    8354.060 ±    169.359   ns/op
baseline       sample   252812    8209.038 ±     14.917   ns/op
staticWrap     sample   248648    8356.810 ±     15.974   ns/op
threadlocal    sample   247362    8388.620 ±     17.310   ns/op
wrapped        sample   249784    8338.419 ±     68.509   ns/op
baseline           ss       15   44933.333 ±   8305.190      ns
staticWrap         ss       15   50533.333 ±  13136.009      ns
threadlocal        ss       15  303600.000 ± 964458.694      ns
wrapped            ss       15   50066.667 ±   7300.839      ns
{code}

Problem with using ms precision is that a lot of the time it takes less then a 
ms to do a task so it ends up being very below the actual value.  I was 
hesitant to use getCurrentThreadCpuTime as a measure, but its a ns collection 
and much more meaningful then walltime.  Kinda ideal so I [gave it a 
shot|https://gist.github.com/clohfink/50e22ab895a7a700661b] - its 1 little more 
then 1us cost per task execution:

{code}
Benchmark        Mode  Samples      Score       Error   Units
baseline         avgt       15   8187.818 ±   162.270   ns/op
staticWrap       avgt       15   9585.818 ±   185.490   ns/op
threadlocal      avgt       15   9508.422 ±   166.103   ns/op
wrapped          avgt       15   9380.099 ±   142.239   ns/op
baseline       sample   251564   8259.852 ±    30.677   ns/op
staticWrap     sample   221622   9365.621 ±    19.831   ns/op
threadlocal    sample   218599   9521.815 ±    33.865   ns/op
wrapped        sample   213108   9743.400 ±    24.726   ns/op
baseline           ss       15  49333.333 ±  8663.130      ns
staticWrap         ss       15  45533.333 ±  9543.739      ns
threadlocal        ss       15  56933.333 ± 12742.318      ns
wrapped            ss       15  52800.000 ± 12001.537      ns
{code}

> Expose time spent in each thread pool
> -------------------------------------
>
>                 Key: CASSANDRA-8341
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Chris Lohfink
>            Priority: Minor
>              Labels: metrics
>         Attachments: 8341.patch, 8341v2.txt
>
>
> Can increment a counter with time spent in each queue.  This can provide 
> context on how much time is spent percentage wise in each stage.  
> Additionally can be used with littles law in future if ever want to try to 
> tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool

Reply via email to