[
https://issues.apache.org/jira/browse/CASSANDRA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028788#comment-15028788
]
Benjamin Lerer commented on CASSANDRA-10225:
--------------------------------------------
Sorry for the delay.
{quote}Computing the compression ratio by making the sum of the
compressedFileLength and dividing it by the sum of the dataLength does not look
a bad approach to me but it seems that the data length might not always be the
real length (according to a comment in CompressionMetadata).{quote}
In case of early opening the data length can effectively be shorter than the
real length but as the SSTable are retrieved with {{SSTableSet.CANONICAL}} the
early opened SSTables are not returned. By consequence the data length will
always be the real one.
While reviewing this problem I also discovered that the compression ratio
returned by some SSTableReader could be wrong (CASSANDRA-10775) as such using
{{sstable.getCompressionRatio() != MetadataCollector.NO_COMPRESSION_RATIO}}
instead of using {{SSTable.compression}} was leading to wrong results even with
the new approach.
As the fix change the behavior of the metrics, it is probably safer to makes
that change in {{3.2}} only.
> Make compression ratio much more accurate
> -----------------------------------------
>
> Key: CASSANDRA-10225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10225
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Jeremy Hanna
> Assignee: Brett Snyder
> Labels: lhf
> Fix For: 2.1.x
>
> Attachments: cassandra-2.1-10225.txt
>
>
> Currently in cfstats, it will take an average over the compression ratios of
> all of the sstables without regard to the data sizes. This can lead to a
> very inaccurate value. It would be good to factor in the uncompressed and
> compressed sizes for the sstables to give an accurate number.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)