[
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341074#comment-14341074
]
Arpit Agarwal commented on HDFS-7836:
-------------------------------------
bq. 1 M blocks per disk, on 10 disks, and 24 bytes per block, is a 240 MB block
report (did I do that math right?) That's definitely bigger than we'd like the
full BR RPC to be, and compression can help here. Or possibly separating the
block report into multiple RPCs. Perhaps one RPC per storage?
We do use one RPC per storage when the block count is over 1M viz.
{{DFS_BLOCKREPORT_SPLIT_THRESHOLD_DEFAULT}}. The math doesn't work since
protobuf uses vint on the wire. _9M blocks ~ 64MB_ was seen empirically in a
couple of different deployments. It was used as the basis for the default of 1M.
bq. Hmm. Our sequential block allocations should guarantee that mod N produces
an approximately equal number of blocks in each stripe. It is only with
randomly allocated block IDs that we could even theoretically get an imbalance
(although the probability is vanishingly small even there if the randomness is
uniform.). With sequentially allocated block IDs the stripes will always be of
equal size. I guess deletions of blocks could change that, but I see no reason
why any group of blocks mod N should be more deleted than another group.
With sequential allocation, a job with that does 'create N files, delete M
files, repeat' could cause that imbalance over time.
> BlockManager Scalability Improvements
> -------------------------------------
>
> Key: HDFS-7836
> URL: https://issues.apache.org/jira/browse/HDFS-7836
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Charles Lamb
> Assignee: Charles Lamb
> Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)