[GitHub] [hbase] bharathv commented on a change in pull request #3244: HBASE-25869 WAL value compression

GitBox Mon, 10 May 2021 17:12:22 -0700


bharathv commented on a change in pull request #3244:
URL: https://github.com/apache/hbase/pull/3244#discussion_r629760196




##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
##########
@@ -34,21 +36,49 @@
 @InterfaceAudience.LimitedPrivate({HBaseInterfaceAudience.COPROC, 
HBaseInterfaceAudience.PHOENIX})
 public class CompressionContext {
 
-  static final String ENABLE_WAL_TAGS_COMPRESSION =
+  public static final String ENABLE_WAL_TAGS_COMPRESSION =
       "hbase.regionserver.wal.tags.enablecompression";
 
+  public static final String ENABLE_WAL_VALUE_COMPRESSION =
+      "hbase.regionserver.wal.value.enablecompression";
+
   public enum DictionaryIndex {
     REGION, TABLE, FAMILY, QUALIFIER, ROW
   }
 
+  static class ValueCompressor {
+    final Deflater deflater;
+    final Inflater inflater;
+
+    public ValueCompressor() {
+      deflater = new Deflater();
+      inflater = new Inflater();

Review comment:
       All the inflate deflate methods are synchronized on a single object, for 
example,  Is that a a concern for bottleneck since we are sharing instances of 
them, just curious.
   
   ```
   >   public int inflate(byte[] b, int off, int len)
   >         throws DataFormatException
   >     {
   >         if (b == null) {
   >             throw new NullPointerException();
   >         }
   >         if (off < 0 || len < 0 || off > b.length - len) {
   >             throw new ArrayIndexOutOfBoundsException();
   >         }
   >         synchronized (zsRef) {  <===
    >             ensureOpen();
   >             int thisLen = this.len;
   >             int n = inflateBytes(zsRef.address(), b, off, len);
   >             bytesWritten += n;
   >             bytesRead += (thisLen - this.len);
   >             return n;
   >         }
   >     }
   ```

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -302,14 +343,28 @@ protected Cell parseCell() throws IOException {
         
compression.getDictionary(CompressionContext.DictionaryIndex.QUALIFIER));
       pos += elemLen;
 
-      // timestamp, type and value
-      int tsTypeValLen = length - pos;
+      // timestamp
+      long ts = StreamUtils.readLong(in);
+      pos = Bytes.putLong(backingArray, pos, ts);
+      // type and value
+      int typeValLen = length - pos;
       if (tagsLength > 0) {
-        tsTypeValLen = tsTypeValLen - tagsLength - KeyValue.TAGS_LENGTH_SIZE;
+        typeValLen = typeValLen - tagsLength - KeyValue.TAGS_LENGTH_SIZE;
+      }
+      // high bit of type byte is 1 if value is compressed
+      byte type = (byte)in.read();
+      if ((type & 0x80) == 0x80) {

Review comment:
       
   ....... My original comment .....
   Why this and why not,
   
   ```
   if (header.has_value_compression) {
    ... readCompressed
   }  else {
    ...
   }
   ```
   
   Value compression is a property of WALCellCodec which is created as a part 
of WAL reader/writer at init. Why do we have to include this compression bit as 
a part of every Cell type if we already have it as a part of the header (which 
I see you added in this patch). So rather than doing this per cell, you could 
just do it once at code init if has_value_compression == true? Is it possible 
that some KVs in the same WAL are compressed and some aren't?
   
   ...........
   
   Above was my original comment before reading the full patch, now I see the 
code about VALUE_COMPRESS_THRESHOLD. I believe this is for for handling the 
small cell values, right?
   
   Read somewhere that zlib has 18bytes overhead (header trailer and such, 
don't quote me on this, need to read the original RFC), so even with a million 
entries, the size increase is not much. Also that will be amortized by larger 
sized values anyway. On the flip side it makes the code ugly with special type 
bytes. Isn't it better to just zlib compress all the values? WDYT.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] bharathv commented on a change in pull request #3244: HBASE-25869 WAL value compression

Reply via email to