[
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219017#comment-14219017
]
Chris Lohfink commented on CASSANDRA-8341:
------------------------------------------
as reference point, I did a [simplistic
benchmark|https://gist.github.com/clohfink/9ee94e3767b5d8170220] (on a 2014
MBP) with JMH of a few scenarios from patch v1:
{code}
Benchmark Mode Samples Score Error Units
baseline avgt 15 8300.492 ± 92.987 ns/op
staticWrap avgt 15 8438.268 ± 132.177 ns/op
threadlocal avgt 15 8464.201 ± 161.554 ns/op
wrapped avgt 15 8424.011 ± 134.407 ns/op
baseline sample 253749 8197.106 ± 16.157 ns/op
staticWrap sample 251910 8233.368 ± 14.841 ns/op
threadlocal sample 244075 8540.737 ± 106.365 ns/op
wrapped sample 247083 8443.597 ± 99.892 ns/op
baseline ss 15 49466.667 ± 8039.477 ns
staticWrap ss 15 44466.667 ± 7697.095 ns
threadlocal ss 15 52266.667 ± 11679.329 ns
wrapped ss 15 60000.000 ± 24000.352 ns
{code}
As a v2 I removed the meter, and switched to using currentmiilis instead of
nanotime (~1/3 speed). I switched the normal ThreadPool one to just do
everything in overriden execute. It wasn't as simple for the SEPExecutor.
Since I didn't want to break the separation between the JMX SEP and the parent
SEP by collecting metrics in the SEPWorker so I think keeping the threadlocal
approach with a before/after execute callback is a little cleaner (match the
ThreadExecutor api). With [those
changes|https://gist.github.com/clohfink/b51eb027c55008377d93]:
{code}
Benchmark Mode Samples Score Error Units
baseline avgt 15 8052.932 ± 92.517 ns/op
staticWrap avgt 15 8313.957 ± 295.964 ns/op
threadlocal avgt 15 8553.304 ± 189.656 ns/op
wrapped avgt 15 8354.060 ± 169.359 ns/op
baseline sample 252812 8209.038 ± 14.917 ns/op
staticWrap sample 248648 8356.810 ± 15.974 ns/op
threadlocal sample 247362 8388.620 ± 17.310 ns/op
wrapped sample 249784 8338.419 ± 68.509 ns/op
baseline ss 15 44933.333 ± 8305.190 ns
staticWrap ss 15 50533.333 ± 13136.009 ns
threadlocal ss 15 303600.000 ± 964458.694 ns
wrapped ss 15 50066.667 ± 7300.839 ns
{code}
Problem with using ms precision is that a lot of the time it takes less then a
ms to do a task so it ends up being very below the actual value. I was
hesitant to use getCurrentThreadCpuTime as a measure, but its a ns collection
and much more meaningful then walltime. Kinda ideal so I [gave it a
shot|https://gist.github.com/clohfink/50e22ab895a7a700661b] - its 1 little more
then 1us cost per task execution:
{code}
Benchmark Mode Samples Score Error Units
baseline avgt 15 8187.818 ± 162.270 ns/op
staticWrap avgt 15 9585.818 ± 185.490 ns/op
threadlocal avgt 15 9508.422 ± 166.103 ns/op
wrapped avgt 15 9380.099 ± 142.239 ns/op
baseline sample 251564 8259.852 ± 30.677 ns/op
staticWrap sample 221622 9365.621 ± 19.831 ns/op
threadlocal sample 218599 9521.815 ± 33.865 ns/op
wrapped sample 213108 9743.400 ± 24.726 ns/op
baseline ss 15 49333.333 ± 8663.130 ns
staticWrap ss 15 45533.333 ± 9543.739 ns
threadlocal ss 15 56933.333 ± 12742.318 ns
wrapped ss 15 52800.000 ± 12001.537 ns
{code}
> Expose time spent in each thread pool
> -------------------------------------
>
> Key: CASSANDRA-8341
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Chris Lohfink
> Priority: Minor
> Labels: metrics
> Attachments: 8341.patch, 8341v2.txt
>
>
> Can increment a counter with time spent in each queue. This can provide
> context on how much time is spent percentage wise in each stage.
> Additionally can be used with littles law in future if ever want to try to
> tune the size of the pools.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)