[
https://issues.apache.org/jira/browse/CASSANDRA-21020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082417#comment-18082417
]
Dmitry Konstantinov commented on CASSANDRA-21020:
-------------------------------------------------
Initial draft for metrics only: [https://github.com/apache/cassandra/pull/4831]
JMH results for ThreadLocal retrieval:
{code:java}
[java] Benchmark Mode Cnt Score Error
Units
[java] CassandraThreadLocalBench.casssandra avgt 4 0.652 ± 0.369
ns/op
[java] CassandraThreadLocalBench.netty avgt 4 1.278 ± 0.444
ns/op
{code}
JMH results for ThreadLocalCounter increment:
{code:java}
[java] Benchmark (type) Mode Cnt
Score Error Units
[java] ThreadLocalMetricsBench.increment NettyThreadLocal avgt 5
2.570 ± 0.333 ns/op
[java] ThreadLocalMetricsBench.increment CassandraThreadLocal avgt 5
1.657 ± 0.202 ns/op
{code}
An Claude analysis of perfasm output printed by using -Djmh.args="-prof
xctraceasm"
h4. Cassandra inner loop — 2 instructions
{code:java}
0x11ea00932: mov 0x1f8(%r10),%r8d ; load threadLocalMetrics field off
CassandraThread (0.44%)
0x11ea00939: test %r8d,%r8d ; null check
je <never taken>
0x11ea00946: shl $0x3,%r8 ; decode compressed oop
(24.98%)
; → back edge
{code}
The JIT has proven the {{instanceof CassandraThread}} check is always true (the
thread type is
monomorphic), so it hoisted it above the loop entirely — it doesn't appear
inside the hot path
at all. The entire get is: one field load + one null check + one shift. The 25%
weight on the
{{shl}} is just retirement stall from the preceding load, not the instruction
itself being slow.
h4. Netty inner loop — 9 instructions
{code:java}
0x120e06c40: mov 0x174(%rax),%r11d ; load threadLocalMap off
FastThreadLocalThread (4.5%)
0x120e06c47: mov 0x54(%r12,%r11,8),%r10d ; load indexedVariables[] off
threadLocalMap (4.2%)
0x120e06c4c: mov 0xc(%r12,%r10,8),%ecx ; load array length
(6.2%)
0x120e06c51: mov 0xc(%rsi),%edi ; load FastThreadLocal.index
(constant) (5.3%)
0x120e06c54: cmp %ecx,%edi ; bounds check
(4.6%)
jge <never taken>
0x120e06c5c: cmp %ecx,%edi ; redundant bounds check (JIT failed
to elim) (5.6%)
jae <never taken>
0x120e06c68: shl $0x3,%r10 ; decode compressed oop on array base
(5.3%)
0x120e06c6c: mov 0x10(%r10,%rdi,4),%r10d ; load indexedVariables[index]
(4.6%)
0x120e06c71: cmp $0xf9e2dbe7,%r10d ; UNSET sentinel check
(5.0%)
je <never taken>
0x120e06c80: mov 0x8(%r12,%r10,8),%r11d ; load klass for checkcast
(5.2%)
0x120e06c85: cmp $0x1b2a10,%r11d ; checkcast ThreadLocalMetricsV2
(7.2%)
jne <never taken>
0x120e06c92: shl $0x3,%r10 ; decode result
(4.9%)
; → back edge
{code}
Netty does *three pointer dereferences* in a chain ({{{}thread → threadLocalMap
→ indexedVariables[] → element{}}}),
plus an UNSET sentinel check and a {{{}checkcast{}}}. Each dereference is a
potential cache miss and a
data-dependency stall — the next load can't start until the previous one
completes. Even with
everything in L1, dependent loads have ~4 cycle latency each, so 3 chained
loads = minimum ~12
cycles vs V1's single load at ~4 cycles.
> Optimize thread local for metrics and tracing
> ---------------------------------------------
>
> Key: CASSANDRA-21020
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21020
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Observability/Metrics, Observability/Tracing
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently Netty thread local logic is used in tracing to keep state and in
> metrics (thread local counters logic introduced in CASSANDRA-20250), so we do
> the thread local lookups many times during a request processing. These cases
> can be optimized by placing these objects as field variables to Thread itself
> by introducing CassandraThread as a child of FastThreadLocalThread.
> Similar idea can be found even in JDK (ThreadLocalRandom logic was introduced
> for ForkJoinPool speedup)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]