Matt Byrd created CASSANDRA-21465:
-------------------------------------

             Summary: cache bytesOnDisk in SSTableReader to avoid excessive 
allocations reading MaxSSTableSize guage
                 Key: CASSANDRA-21465
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21465
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Observability/Metrics
            Reporter: Matt Byrd
            Assignee: Matt Byrd
         Attachments: Screenshot 2026-06-15 at 15.16.10.png

When running a sidecar polling on metrics and a large number of files per 
instance (e.g levelled compaction and O(10-100k) files) this particular 
endpoint generates a large amount of allocations:
{code:java}
        maxSSTableSize = createTableGauge("MaxSSTableSize", new Gauge<Long>()
        {
            @Override
            public Long getValue()
            {
                return cfs.getTracker()
                          .getView()
                          .liveSSTables()
                          .stream()
                          .map(SSTableReader::bytesOnDisk)
                          .max(Long::compare)
                          .orElse(0L);
            }
        }); {code}
One option is to just not collect this metric, but I imagine a lot of operators 
maybe running into this unintentionally.

We should be able to cache bytesOnDisk on SSTableReader and avoid all the 
allocations when invoking bytesOnDisk (and the syscall to actually get the 
size).

The stables should in general be immutable and hence caching this fine, modulo 
early-open which can be special cased and is often disabled.

I think this is probably preferable to trying to keep this metric updated 
inline as stables are created/updated (a bit tricky with max).

Another larger change we might consider is caching the descriptor.fileFor so 
more general consumers also avoid allocation here, this however is probably 
also a step to far in terms of the complexity benefit trade-off . 


One other simple win here is just to also remove the use of streams, for more 
predictable performance, I think the current chaining may be too complex for 
the jvm to optimise away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to