[ 
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030818#comment-13030818
 ] 

Jean-Daniel Cryans commented on HBASE-3732:
-------------------------------------------

bq. Perhaps we can compress the value depending on whether it's fatter than a 
certain threshold

That would make sense, or it could be in the HCD.

bq. do we need to call HFile#getCompressingStream if the value is already 
compressed up front

The fact that the values are compressed should be transparent to the region 
servers, exactly like when the user is compressing the values themselves (like 
I described in the description of this jira).

bq. This seems like a really useful low hanging fruit.

Not so sure about that. I think that are many easy ways to solve this, but most 
of them include polluting the API or doing weird acrobatics in the client. 
Compressing/decompressing is easy, it's all about where you're going to do it 
in the code.

> New configuration option for client-side compression
> ----------------------------------------------------
>
>                 Key: HBASE-3732
>                 URL: https://issues.apache.org/jira/browse/HBASE-3732
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>
>         Attachments: compressed_streams.jar
>
>
> We have a case here where we have to store very fat cells (arrays of 
> integers) which can amount into the hundreds of KBs that we need to read 
> often, concurrently, and possibly keep in cache. Compressing the values on 
> the client using java.util.zip's Deflater before sending them to HBase proved 
> to be in our case almost an order of magnitude faster.
> There reasons are evident: less data sent to hbase, memstore contains 
> compressed data, block cache contains compressed data too, etc.
> I was thinking that it might be something useful to add to a family schema, 
> so that Put/Result do the conversion for you. The actual compression algo 
> should also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to