bharathv commented on a change in pull request #3244:
URL: https://github.com/apache/hbase/pull/3244#discussion_r629760196
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
##########
@@ -34,21 +36,49 @@
@InterfaceAudience.LimitedPrivate({HBaseInterfaceAudience.COPROC,
HBaseInterfaceAudience.PHOENIX})
public class CompressionContext {
- static final String ENABLE_WAL_TAGS_COMPRESSION =
+ public static final String ENABLE_WAL_TAGS_COMPRESSION =
"hbase.regionserver.wal.tags.enablecompression";
+ public static final String ENABLE_WAL_VALUE_COMPRESSION =
+ "hbase.regionserver.wal.value.enablecompression";
+
public enum DictionaryIndex {
REGION, TABLE, FAMILY, QUALIFIER, ROW
}
+ static class ValueCompressor {
+ final Deflater deflater;
+ final Inflater inflater;
+
+ public ValueCompressor() {
+ deflater = new Deflater();
+ inflater = new Inflater();
Review comment:
All the inflate deflate methods are synchronized on a single object, for
example, Is that a a concern for bottleneck since we are sharing instances of
them, just curious.
```
> public int inflate(byte[] b, int off, int len)
> throws DataFormatException
> {
> if (b == null) {
> throw new NullPointerException();
> }
> if (off < 0 || len < 0 || off > b.length - len) {
> throw new ArrayIndexOutOfBoundsException();
> }
> synchronized (zsRef) { <===
> ensureOpen();
> int thisLen = this.len;
> int n = inflateBytes(zsRef.address(), b, off, len);
> bytesWritten += n;
> bytesRead += (thisLen - this.len);
> return n;
> }
> }
```
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -302,14 +343,28 @@ protected Cell parseCell() throws IOException {
compression.getDictionary(CompressionContext.DictionaryIndex.QUALIFIER));
pos += elemLen;
- // timestamp, type and value
- int tsTypeValLen = length - pos;
+ // timestamp
+ long ts = StreamUtils.readLong(in);
+ pos = Bytes.putLong(backingArray, pos, ts);
+ // type and value
+ int typeValLen = length - pos;
if (tagsLength > 0) {
- tsTypeValLen = tsTypeValLen - tagsLength - KeyValue.TAGS_LENGTH_SIZE;
+ typeValLen = typeValLen - tagsLength - KeyValue.TAGS_LENGTH_SIZE;
+ }
+ // high bit of type byte is 1 if value is compressed
+ byte type = (byte)in.read();
+ if ((type & 0x80) == 0x80) {
Review comment:
....... My original comment .....
Why this and why not,
```
if (header.has_value_compression) {
... readCompressed
} else {
...
}
```
Value compression is a property of WALCellCodec which is created as a part
of WAL reader/writer at init. Why do we have to include this compression bit as
a part of every Cell type if we already have it as a part of the header (which
I see you added in this patch). So rather than doing this per cell, you could
just do it once at code init if has_value_compression == true? Is it possible
that some KVs in the same WAL are compressed and some aren't?
...........
Above was my original comment before reading the full patch, now I see the
code about VALUE_COMPRESS_THRESHOLD. I believe this is for for handling the
small cell values, right?
Read somewhere that zlib has 18bytes overhead (header trailer and such,
don't quote me on this, need to read the original RFC), so even with a million
entries, the size increase is not much. Also that will be amortized by larger
sized values anyway. On the flip side it makes the code ugly with special type
bytes. Isn't it better to just zlib compress all the values? WDYT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]