[ http://issues.apache.org/jira/browse/HADOOP-652?page=comments#action_12448720 ] Vladimir Krokhmalyov commented on HADOOP-652: ---------------------------------------------
I create and delete a lot of files in DFS, so I see that speed of DataNode extremely go down! > a) currently subdirectories are created only in the last sub directory (e.g. > subdir63). Not only subdir63. After the DataNode restarts, subdirectories will be created in other subdirXY, because "File[] files = dir.listFiles();" in FSDir constructor lists subdirectories and files in arbitrary order and last subdirectory will be other. Two branches: subdir63 and other one. It is not bug. Other code process this type of tree properly. > b) remove siblings array I think it only increase s recursion in addBlock(). Recursion is not good idea, because it very slow when DataNode stores a lot of blocks. I think this algorithm should be changed in the future. Here is my tested solution of this bug: New method clearPath() in FSDir: > void clearPath( File f ) { > if ( dir.compareTo( f ) == 0 ) numBlocks--; > else { > if ( ( siblings != null ) && ( myIdx != ( siblings.length - 1 ) ) ) > siblings[ myIdx + 1 ].clearPath( f ); > else if ( children != null ) > children[ 0 ].clearPath( f ); > } > } New method clearPath() in FSVolume: > void clearPath( File f ) { > dataDir.clearPath( f ); > } Changes in invalidate() method in FSDataset: < blockMap.remove(invalidBlks[i]); > synchronized ( ongoingCreates ) { > blockMap.remove( invalidBlks[ i ] ); > FSVolume v = volumeMap.get( invalidBlks[ i ] ); > volumeMap.remove( invalidBlks[ i ] ); > v.clearPath( f.getParentFile() ); > } And changes in getFile() method in FSDataset: < return blockMap.get(b); > synchronized ( ongoingCreates ) { > return blockMap.get( b ); > } Now I will try to create patch file properly. P.S. Also I set dfs.blockreport.intervalMsec = 10000 ( 10 - 30 sec ) in order to prevent lowering NameNode's speed. Because NameNode holds deleted blocks in its datastructures across block reports. > Not all Datastructures are updated when a block is deleted > ---------------------------------------------------------- > > Key: HADOOP-652 > URL: http://issues.apache.org/jira/browse/HADOOP-652 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Raghu Angadi > > Currently when a block is deleted, DataNode just deletes the physical file > and updates its map. We need to update more things. For e.g. numBlocks in > FSDir is not decremented.. effect of this would be that we will create more > subdirectories than necessary. It might not show up badly yet since numBlocks > gets correct value when the dataNode restarts. I have to see what else needs > to be updated. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira