[
https://issues.apache.org/jira/browse/CASSANDRA-21465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Byrd updated CASSANDRA-21465:
----------------------------------
Attachment: Screenshot 2026-06-15 at 15.16.10.png
> cache bytesOnDisk in SSTableReader to avoid excessive allocations reading
> MaxSSTableSize guage
> ----------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21465
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21465
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Observability/Metrics
> Reporter: Matt Byrd
> Assignee: Matt Byrd
> Priority: Normal
> Attachments: Screenshot 2026-06-15 at 15.16.10.png
>
>
> When running a sidecar polling on metrics and a large number of files per
> instance (e.g levelled compaction and O(10-100k) files) this particular
> endpoint generates a large amount of allocations:
> {code:java}
> maxSSTableSize = createTableGauge("MaxSSTableSize", new Gauge<Long>()
> {
> @Override
> public Long getValue()
> {
> return cfs.getTracker()
> .getView()
> .liveSSTables()
> .stream()
> .map(SSTableReader::bytesOnDisk)
> .max(Long::compare)
> .orElse(0L);
> }
> }); {code}
> One option is to just not collect this metric, but I imagine a lot of
> operators maybe running into this unintentionally.
> We should be able to cache bytesOnDisk on SSTableReader and avoid all the
> allocations when invoking bytesOnDisk (and the syscall to actually get the
> size).
> The stables should in general be immutable and hence caching this fine,
> modulo early-open which can be special cased and is often disabled.
> I think this is probably preferable to trying to keep this metric updated
> inline as stables are created/updated (a bit tricky with max).
> Another larger change we might consider is caching the descriptor.fileFor so
> more general consumers also avoid allocation here, this however is probably
> also a step to far in terms of the complexity benefit trade-off .
> One other simple win here is just to also remove the use of streams, for more
> predictable performance, I think the current chaining may be too complex for
> the jvm to optimise away.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]