[ 
https://issues.apache.org/jira/browse/CASSANDRA-21465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Byrd updated CASSANDRA-21465:
----------------------------------
    Attachment: Screenshot 2026-06-15 at 15.16.10.png

> cache bytesOnDisk in SSTableReader to avoid excessive allocations reading 
> MaxSSTableSize guage
> ----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21465
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21465
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Observability/Metrics
>            Reporter: Matt Byrd
>            Assignee: Matt Byrd
>            Priority: Normal
>         Attachments: Screenshot 2026-06-15 at 15.16.10.png
>
>
> When running a sidecar polling on metrics and a large number of files per 
> instance (e.g levelled compaction and O(10-100k) files) this particular 
> endpoint generates a large amount of allocations:
> {code:java}
>         maxSSTableSize = createTableGauge("MaxSSTableSize", new Gauge<Long>()
>         {
>             @Override
>             public Long getValue()
>             {
>                 return cfs.getTracker()
>                           .getView()
>                           .liveSSTables()
>                           .stream()
>                           .map(SSTableReader::bytesOnDisk)
>                           .max(Long::compare)
>                           .orElse(0L);
>             }
>         }); {code}
> One option is to just not collect this metric, but I imagine a lot of 
> operators maybe running into this unintentionally.
> We should be able to cache bytesOnDisk on SSTableReader and avoid all the 
> allocations when invoking bytesOnDisk (and the syscall to actually get the 
> size).
> The stables should in general be immutable and hence caching this fine, 
> modulo early-open which can be special cased and is often disabled.
> I think this is probably preferable to trying to keep this metric updated 
> inline as stables are created/updated (a bit tricky with max).
> Another larger change we might consider is caching the descriptor.fileFor so 
> more general consumers also avoid allocation here, this however is probably 
> also a step to far in terms of the complexity benefit trade-off . 
> One other simple win here is just to also remove the use of streams, for more 
> predictable performance, I think the current chaining may be too complex for 
> the jvm to optimise away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to