Wellington Chevreuil created HBASE-27386: --------------------------------------------
Summary: Use encoded size for calculating compression ratio in block size predicator Key: HBASE-27386 URL: https://issues.apache.org/jira/browse/HBASE-27386 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil In HBASE-27264 we had introduced the notion of block size predicators to define hfile block boundaries when writing a new hfile, and provided the PreviousBlockCompressionRatePredicator implementation for calculating block sizes based on a compression ratio. It was using the raw data size written to the block so far to calculate the compression ratio, but in the case where encoding is enabled, this could lead to a very high compression ratio and therefore, larger block sizes. We should use the encoded size to calculate compression ratio, instead. Here's a example scenario: 1) Sample block size when not using the PreviousBlockCompressionRatePredicator as implemented by HBASE-27264: {noformat} onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat} 2) Sample block size when using PreviousBlockCompressionRatePredicator as implemented by HBASE-27264 (uses raw data size to calculate compression rate): {noformat} onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393 {noformat} 3) Sample block size when using PreviousBlockCompressionRatePredicator with encoded size for calculating compression rate: {noformat} onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)