Marcell Ortutay created PHOENIX-4902:
----------------------------------------
Summary: Snappy compression benefit is lost when generate hash
cache RPC
Key: PHOENIX-4902
URL: https://issues.apache.org/jira/browse/PHOENIX-4902
Project: Phoenix
Issue Type: Bug
Reporter: Marcell Ortutay
Phoenix uses snappy compression on hash caches before it sends them to region
server:
{code}
int maxCompressedSize =
Snappy.maxCompressedLength(baOut.size());
byte[] compressed = new byte[maxCompressedSize]; // size for
worst case
int compressedSize = Snappy.compress(baOut.getBuffer(), 0,
baOut.size(), compressed, 0);
// Last realloc to size of compressed buffer.
ptr.set(compressed,0,compressedSize);
{code}
However, looking at debug output, it seems like the serialized protobuf that it
sends to region servers does not have the benefits of snappy compression. Below
is an excerpt of some debug output I put in:
{code}
Building an RPC with a cache ptr of size: 39MB // The compressed size is 39MB
Done serializing the AddServerCacheRequest RPC, size is 206MB // However the
serialized RPC is 206MB
And the cache ptr size is: 206MB // And specifically, the byte array that
contains the serialized hash cache is 206MB
{code}
I've made a simple test codebase to attempt to reproduce this bug. It shows
similar behavior:
{code}
bytes size: 10000 bytes
compressed bytes size: 721 bytes
message size: 10003 bytes
compressed message size: 11701 bytes
{code}
The code for the simplified example is here:
https://github.com/ortutay/snappy-bytes-buffer/blob/master/src/main/java/testprotobuf/Main.java
I observed this behavior in Phoenix 4.14.1
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)