[
https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343805#comment-14343805
]
Colin Patrick McCabe commented on HDFS-7845:
--------------------------------------------
As [~arpitagarwal] pointed out, we're not dealing with a series of ints, but
with a series of protobuf vints (variable length ints). [~clamb] did some
tests with a block report and got around 50% (if I'm remembering correctly?)
[~clamb], can you comment on whether those tests were done with vints or
regular integers?
We should probably make sure we're doing the compression test with what we're
actually sending, which is going to be a 3-tuple of [ block_id, genstamp,
length ], all encoded as protobuf vints. Sorting is an interesting idea, but I
wonder if the effectiveness diminishes when you interleave the 3 numbers? Of
course we could separate them, but then our L1 / L2 cache hit rates plummet
when actually processing the blocks.
> Compress block reports
> ----------------------
>
> Key: HDFS-7845
> URL: https://issues.apache.org/jira/browse/HDFS-7845
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7836
> Reporter: Colin Patrick McCabe
> Assignee: Charles Lamb
>
> We should optionally compress block reports using a low-cpu codec such as lz4
> or snappy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)