[
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042595#comment-13042595
]
Jonathan Gray commented on HBASE-3732:
--------------------------------------
I agree that value compression is easily done at the application level. In
cases where you have very large values, compressing that data is something you
should always be thinking about.
Published or contributed code samples could go a long way. Are there things we
could add in Put/Get to make this kind of stuff easily pluggable?
If it can be integrated simply, then this might be okay, but it should probably
be part of a larger conversation about compression. And anything that touches
KV needs to be thought through.
I think there could be some substantial savings in hbase-specific prefix or
row/family/qualifier compression, both on-disk and in-memory. One idea there
would require some complicating of KeyValue and its comparator, or a simpler
solution would require short-term memory allocations to reconstitute KVs as
they make their way through the KVHeap/KVScanner.
I've also done some work on supporting a two-level compressed/uncompressed
block cache patch (with lzo). I'm waiting to finish until HBASE-3857 goes in
as it adds some things that make life easier in the HFile code.
> New configuration option for client-side compression
> ----------------------------------------------------
>
> Key: HBASE-3732
> URL: https://issues.apache.org/jira/browse/HBASE-3732
> Project: HBase
> Issue Type: New Feature
> Reporter: Jean-Daniel Cryans
> Fix For: 0.92.0
>
> Attachments: compressed_streams.jar
>
>
> We have a case here where we have to store very fat cells (arrays of
> integers) which can amount into the hundreds of KBs that we need to read
> often, concurrently, and possibly keep in cache. Compressing the values on
> the client using java.util.zip's Deflater before sending them to HBase proved
> to be in our case almost an order of magnitude faster.
> There reasons are evident: less data sent to hbase, memstore contains
> compressed data, block cache contains compressed data too, etc.
> I was thinking that it might be something useful to add to a family schema,
> so that Put/Result do the conversion for you. The actual compression algo
> should also be configurable.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira