[jira] [Commented] (HDFS-7845) Compress block reports

Colin Patrick McCabe (JIRA) Mon, 02 Mar 2015 13:36:32 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343805#comment-14343805
 ]


Colin Patrick McCabe commented on HDFS-7845:
--------------------------------------------

As [~arpitagarwal] pointed out, we're not dealing with a series of ints, but 
with a series of protobuf vints (variable length ints).  [~clamb] did some 
tests with a block report and got around 50% (if I'm remembering correctly?)  
[~clamb], can you comment on whether those tests were done with vints or 
regular integers?

We should probably make sure we're doing the compression test with what we're 
actually sending, which is going to be a 3-tuple of [ block_id, genstamp, 
length ], all encoded as protobuf vints.  Sorting is an interesting idea, but I 
wonder if the effectiveness diminishes when you interleave the 3 numbers?  Of 
course we could separate them, but then our L1 / L2 cache hit rates plummet 
when actually processing the blocks.

> Compress block reports
> ----------------------
>
>                 Key: HDFS-7845
>                 URL: https://issues.apache.org/jira/browse/HDFS-7845
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Charles Lamb
>
> We should optionally compress block reports using a low-cpu codec such as lz4 
> or snappy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7845) Compress block reports

Reply via email to