[
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daryn Sharp updated HDFS-9260:
------------------------------
Attachment: FBR processing.png
I've read the doc now. Sorry I commented before doing so. The results are
interesting until the the final details about a 4x reduction in block updates.
Here are some basic specs to consider:
* 10-80k adds/min
* job submissions increasing replication factor to 10
* at least 1 node/day decommissioning or going dead with 100k-400k blocks
* every few weeks entire racks (40 nodes) are decommissioned for refresh or
reallocation
* balancer is constantly churning to populate recommissioned dead nodes
That's a lot of IBRs which is why a 4x degradation is quite concerning. The
block report processing times seem a bit high in the tests. :) I'll attach an
image of the BR processing times for some of our busiest clusters. They span
the gamut from 100M-300M blocks with roughly the same number of files. We got
a huge improvement from my BR encoding change + per-storage reports.
BTW, I had/have a working patch that replaced the triplets with sparse yet
densely packed 2-dimensional primitive arrays. Everything is linked via
indices to a greatly reduce the dirty cards to scan. Need to dig up the jira
when my head is above water.
> Improve performance and GC friendliness of startup and FBRs
> -----------------------------------------------------------
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, namenode, performance
> Affects Versions: 2.7.1
> Reporter: Staffan Friberg
> Assignee: Staffan Friberg
> Attachments: FBR processing.png, HDFS Block and Replica Management
> 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch,
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch,
> HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to
> keep them sorted. This allows faster and more GC friendly handling of full
> block reports.
> Would like to hear peoples feedback on this change and also some help
> investigating/understanding a few outstanding issues if we are interested in
> moving forward with this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)