[ 
https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204046#comment-13204046
 ] 

Todd Lipcon commented on HBASE-5355:
------------------------------------

I've been mulling this over in the back of my head recently with regards to the 
work just getting started on adding extensible RPC. Here are a few thoughts:

A lot of our current lack of efficiency can be dealt with by simply avoiding 
multiple copies of the same data. The most egregious example: when we serialize 
columns, we serialize each KeyValue indepedendently, even though they all share 
the same row key.
One potential solution I've been thinking about:
- introduce the concept of a "constant pool" which is associated with an RPC 
request or response. This pool would be serialized on the wire before the 
actual request/response and might be encoded like:
{code}
<total byte length of constant pool> <number of constants> <constant 1 len> 
<constant 1 val> <constant 2 len> <constant 2 val>
{code}
Then in the actual serialization of KeyValues, etc, we would not write out the 
data, but rather indexes into the constant pool.
The advantages to this kind of technique would be:
- pushing all of the data near each other in the packet would make any 
compression more beneficial (rather than interleaving compressible user data 
with less compressible encoded information)
- allows multiple parts of a request/response to reference the same byte arrays 
(eg multiple columns referring to the same row key)
- allows zero-copy implementations even if we use protobufs to encode the 
actual call/response

This idea might be orthogonal to the compression discussed above, but may be a 
cheaper (CPU-wise) way of getting a similar effect.
                
> Compressed RPC's for HBase
> --------------------------
>
>                 Key: HBASE-5355
>                 URL: https://issues.apache.org/jira/browse/HBASE-5355
>             Project: HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.89.20100924
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> Some application need ability to do large batched writes and reads from a 
> remote MR cluster. These eventually get bottlenecked on the network. These 
> results are also pretty compressible sometimes.
> The aim here is to add the ability to do compressed calls to the server on 
> both the send and receive paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to