[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986062#comment-14986062 ]
Staffan Friberg commented on HDFS-9260: --------------------------------------- Hi Daryn, Thanks for taking a look at the patch. 1. FBR and startup improves, please see the attached PDF. 2. Will need to check what we do here (and if I still have the old logs), but doesn't feel like it should be affected 3. We will be slightly slower when deleting a file or removing with the current algorithms as it goes through the LightWeightGSet to first lookup/remove each affected blockinfo, and after that remove it from the linked list. In my case it will be removed from treeset which requires a new lookup. However while this is slower I think the time it takes to that process is far outweighed by the time it takes for deleting or redistributing blocks on all DN. Deleting files with a large number of blocks seems to take on the order of hours since we only send small parts of the total block list to each node on every heartbeat. No to familiar with how aggressive the redistribution is in the event of a DN decommission. 4. It will decrease as long as the TreeSet is kept above ~50% fill ratio, since the reference to each blockinfo no is a single pointer from the treeset instead of the double linked list. > Improve performance and GC friendliness of startup and FBRs > ----------------------------------------------------------- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance > Affects Versions: 2.7.1 > Reporter: Staffan Friberg > Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)