[ 
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042595#comment-13042595
 ] 

Jonathan Gray commented on HBASE-3732:
--------------------------------------

I agree that value compression is easily done at the application level.  In 
cases where you have very large values, compressing that data is something you 
should always be thinking about.

Published or contributed code samples could go a long way.  Are there things we 
could add in Put/Get to make this kind of stuff easily pluggable?

If it can be integrated simply, then this might be okay, but it should probably 
be part of a larger conversation about compression.  And anything that touches 
KV needs to be thought through.

I think there could be some substantial savings in hbase-specific prefix or 
row/family/qualifier compression, both on-disk and in-memory.  One idea there 
would require some complicating of KeyValue and its comparator, or a simpler 
solution would require short-term memory allocations to reconstitute KVs as 
they make their way through the KVHeap/KVScanner.

I've also done some work on supporting a two-level compressed/uncompressed 
block cache patch (with lzo).  I'm waiting to finish until HBASE-3857 goes in 
as it adds some things that make life easier in the HFile code.

> New configuration option for client-side compression
> ----------------------------------------------------
>
>                 Key: HBASE-3732
>                 URL: https://issues.apache.org/jira/browse/HBASE-3732
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>
>         Attachments: compressed_streams.jar
>
>
> We have a case here where we have to store very fat cells (arrays of 
> integers) which can amount into the hundreds of KBs that we need to read 
> often, concurrently, and possibly keep in cache. Compressing the values on 
> the client using java.util.zip's Deflater before sending them to HBase proved 
> to be in our case almost an order of magnitude faster.
> There reasons are evident: less data sent to hbase, memstore contains 
> compressed data, block cache contains compressed data too, etc.
> I was thinking that it might be something useful to add to a family schema, 
> so that Put/Result do the conversion for you. The actual compression algo 
> should also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to