Marcell Ortutay created PHOENIX-4902:
----------------------------------------

             Summary: Snappy compression benefit is lost when generate hash 
cache RPC
                 Key: PHOENIX-4902
                 URL: https://issues.apache.org/jira/browse/PHOENIX-4902
             Project: Phoenix
          Issue Type: Bug
            Reporter: Marcell Ortutay


Phoenix uses snappy compression on hash caches before it sends them to region 
server:

{code}
                int maxCompressedSize = 
Snappy.maxCompressedLength(baOut.size());
                byte[] compressed = new byte[maxCompressedSize]; // size for 
worst case
                int compressedSize = Snappy.compress(baOut.getBuffer(), 0, 
baOut.size(), compressed, 0);
                // Last realloc to size of compressed buffer.
                ptr.set(compressed,0,compressedSize);
{code}

However, looking at debug output, it seems like the serialized protobuf that it 
sends to region servers does not have the benefits of snappy compression. Below 
is an excerpt of some debug output I put in:

{code}
Building an RPC with a cache ptr of size: 39MB  // The compressed size is 39MB
Done serializing the AddServerCacheRequest RPC, size is 206MB  // However the 
serialized RPC is 206MB
And the cache ptr size is: 206MB  // And specifically, the byte array that 
contains the serialized hash cache is 206MB
{code}

I've made a simple test codebase to attempt to reproduce this bug. It shows 
similar behavior:

{code}
bytes size: 10000 bytes
compressed bytes size: 721 bytes
message size: 10003 bytes
compressed message size: 11701 bytes
{code}

The code for the simplified example is here: 
https://github.com/ortutay/snappy-bytes-buffer/blob/master/src/main/java/testprotobuf/Main.java

I observed this behavior in Phoenix 4.14.1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to