[ 
https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343879#comment-14343879
 ] 

Todd Lipcon commented on HDFS-7845:
-----------------------------------

Agreed we should run various simulations before writing the code in Hadoop. My 
guess (from intuition only, not testing) is that the BLOSC/LZ compression will 
actually result in better results than doing vint encoding. You would run the 
BLOSC/LZ on the raw int arrays, not the vint-encoded ones, and I'd guess it's 
actually faster to do BLOSC/LZ compress/decompress compared to PB-style vints. 
The former is very SIMD-able whereas the latter requires a branch per byte so 
inhibits good processor pipelining.

If you were to use BLOSC to compress the [block, gs, length] tuples, you'd set 
"typesize=24" instead of "typesize=8". Can anyone gather a text dump of all 
blockid/genstamp/size data from a large production DN that we could run 
experiments with?

> Compress block reports
> ----------------------
>
>                 Key: HDFS-7845
>                 URL: https://issues.apache.org/jira/browse/HDFS-7845
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Charles Lamb
>
> We should optionally compress block reports using a low-cpu codec such as lz4 
> or snappy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to