[
https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343879#comment-14343879
]
Todd Lipcon commented on HDFS-7845:
-----------------------------------
Agreed we should run various simulations before writing the code in Hadoop. My
guess (from intuition only, not testing) is that the BLOSC/LZ compression will
actually result in better results than doing vint encoding. You would run the
BLOSC/LZ on the raw int arrays, not the vint-encoded ones, and I'd guess it's
actually faster to do BLOSC/LZ compress/decompress compared to PB-style vints.
The former is very SIMD-able whereas the latter requires a branch per byte so
inhibits good processor pipelining.
If you were to use BLOSC to compress the [block, gs, length] tuples, you'd set
"typesize=24" instead of "typesize=8". Can anyone gather a text dump of all
blockid/genstamp/size data from a large production DN that we could run
experiments with?
> Compress block reports
> ----------------------
>
> Key: HDFS-7845
> URL: https://issues.apache.org/jira/browse/HDFS-7845
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7836
> Reporter: Colin Patrick McCabe
> Assignee: Charles Lamb
>
> We should optionally compress block reports using a low-cpu codec such as lz4
> or snappy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)