[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561485#action_12561485 ]
Raghu Angadi commented on HADOOP-2576: -------------------------------------- Thanks Christian, I have access to the logs. The cluster seems to be running an old version of the trunk can you get the svn revision? Also Namenode was recently restarted. Looks like there another linked list attached each datanode. {{metasave}} prints only the "recent invalidates". A loop in Namenode moves the invalidated blocks from recent invalidates to the datanode list. So it is possible for the block to exist many more times in this list. This is most probably the reason. I think it is better to relieve Namenode from throttling the deletion of blocks. In cases like these there seems to quite a bit of penalty on Namenode memory, the most precious resource for HDFS. Namenode could just ask Datanode to delete anything that it want to delete. Datanode could throttle it, I think it would be more scalable. This will also remove code related to management of throttling. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high > turnover of files. > The performance of these applications seem to be correlated to the lifetime > of the namenode: > After starting the namenode, the applications need increasingly more time to > complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% > to 30%, memory usage nearly doubles (although the average amount of data on > dfs stays the same), and the average load factor increases by a factor of 2-3 > (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete > blocks coming from the namenode for blocks not in the blockmap of the > datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I > noticed a noticeable increase with the lifetime of the namenode (a factor of > 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid > blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.