Jon Haddad created CASSANDRA-21459:
--------------------------------------
Summary: Eliminate per-row enum array allocation in cursor
compaction clustering deserialization
Key: CASSANDRA-21459
URL: https://issues.apache.org/jira/browse/CASSANDRA-21459
Project: Apache Cassandra
Issue Type: Sub-task
Components: Local/Compaction
Reporter: Jon Haddad
{{ClusteringPrefix.Kind.values()}} is called once per row read
({{ClusteringDescriptor.loadClustering}}, both call sites) and once per range
tombstone marker written ({{SSTableCursorWriter.writeRangeTombstone}}). Java
clones the enum constants array on every {{values()}} call, so the cursor
compaction path — which is intended to be allocation-free per row — allocates a
fresh ~40-byte {{Kind[]}} for every row read from every source sstable and for
every range tombstone marker written.
The fix caches the array once in a {{static final}} field ({{Kind.ALL_KINDS}})
and indexes into the shared copy at the three cursor hot-path sites.
Found via JFR allocation profiling
({{jdk.ObjectAllocationInNewTLAB}}/{{OutsideTLAB}} with stack traces) during
cursor compaction: with the patch, the {{ClusteringPrefix$Kind[]}} allocation
site disappears from the profile entirely. In an allocation-scaling measurement
comparing a 1,200-row compaction against a 12,000-row compaction, allocation
growth drops from 1,487,448 to 449,488 bytes; the remainder is attributable to
test-environment {{Ref}} debug tracking and chunk-cache machinery rather than
cursor code.
The same {{values()}} pattern exists on the iterator deserialization path
({{ClusteringPrefix.serializer}}, three sites). Those are left unchanged here
to keep this patch minimal and scoped to the cursor path; they can be addressed
separately if desired.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]