[ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887213#comment-13887213 ]
Suresh Srinivas commented on HDFS-5790: --------------------------------------- I know that many of the HDFS restarts with running jobs that have opened many files run into this issue. In the past I had fixed a bug where namenode did editlog sync holding lock. Even with that I see that this issue slows down lease recovery and namenode in such restarts becomes unresponsive. That said, I am okay not putting this into 2.3. > LeaseManager.findPath is very slow when many leases need recovery > ----------------------------------------------------------------- > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance > Affects Versions: 2.3.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Fix For: 3.0.0, 2.4.0 > > Attachments: hdfs-5790.txt, hdfs-5790.txt > > > We recently saw an issue where the NN restarted while tens of thousands of > files were open. The NN then ended up spending multiple seconds for each > commitBlockSynchronization() call, spending most of its time inside > LeaseManager.findPath(). findPath currently works by looping over all files > held for a given writer, and traversing the filesystem for each one. This > takes way too long when tens of thousands of files are open by a single > writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)