apurtell commented on a change in pull request #3244:
URL: https://github.com/apache/hbase/pull/3244#discussion_r629757222
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -241,10 +246,27 @@ public void write(Cell cell) throws IOException {
compression.getDictionary(CompressionContext.DictionaryIndex.FAMILY));
PrivateCellUtil.compressQualifier(out, cell,
compression.getDictionary(CompressionContext.DictionaryIndex.QUALIFIER));
- // Write timestamp, type and value as uncompressed.
+ // Write timestamp, type and value.
StreamUtils.writeLong(out, cell.getTimestamp());
- out.write(cell.getTypeByte());
- PrivateCellUtil.writeValue(out, cell, cell.getValueLength());
+ byte type = cell.getTypeByte();
+ if (compression.getValueCompressor() != null &&
+ cell.getValueLength() > VALUE_COMPRESS_THRESHOLD) {
+ // Try compressing the cell's value
+ byte[] compressedBytes = compressValue(cell);
+ // Only write the compressed value if we have achieved some space
savings.
+ if (compressedBytes.length < cell.getValueLength()) {
+ // Set the high bit of type to indicate the value is compressed
+ out.write((byte)(type|0x80));
Review comment:
> Jetty settled on a size threshold of 23 bytes.
Thank you @ndimiduk . gzip and deflate are the same thing, essentially.
Let's opt for the smaller threshold and see how it goes. Worst case if the
compressor produces output that is larger than the original, we just discard it
and use the original, so that's not a problem. With a smaller threshold more
values are eligible for compression so there will be more time spent in
compression, but presumably with a pay off in space savings, so that seems
fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]