[
http://issues.apache.org/jira/browse/HADOOP-652?page=comments#action_12448720 ]
Vladimir Krokhmalyov commented on HADOOP-652:
---------------------------------------------
I create and delete a lot of files in DFS, so I see that speed of DataNode
extremely go down!
> a) currently subdirectories are created only in the last sub directory (e.g.
> subdir63).
Not only subdir63. After the DataNode restarts, subdirectories will be created
in other subdirXY, because "File[] files = dir.listFiles();" in FSDir
constructor lists subdirectories and files in arbitrary order and last
subdirectory will be other. Two branches: subdir63 and other one. It is not
bug. Other code process this type of tree properly.
> b) remove siblings array I think it only increase s recursion in addBlock().
Recursion is not good idea, because it very slow when DataNode stores a lot of
blocks. I think this algorithm should be changed in the future.
Here is my tested solution of this bug:
New method clearPath() in FSDir:
> void clearPath( File f ) {
> if ( dir.compareTo( f ) == 0 ) numBlocks--;
> else {
> if ( ( siblings != null ) && ( myIdx != ( siblings.length - 1 ) ) )
> siblings[ myIdx + 1 ].clearPath( f );
> else if ( children != null )
> children[ 0 ].clearPath( f );
> }
> }
New method clearPath() in FSVolume:
> void clearPath( File f ) {
> dataDir.clearPath( f );
> }
Changes in invalidate() method in FSDataset:
< blockMap.remove(invalidBlks[i]);
> synchronized ( ongoingCreates ) {
> blockMap.remove( invalidBlks[ i ] );
> FSVolume v = volumeMap.get( invalidBlks[ i ] );
> volumeMap.remove( invalidBlks[ i ] );
> v.clearPath( f.getParentFile() );
> }
And changes in getFile() method in FSDataset:
< return blockMap.get(b);
> synchronized ( ongoingCreates ) {
> return blockMap.get( b );
> }
Now I will try to create patch file properly.
P.S. Also I set dfs.blockreport.intervalMsec = 10000 ( 10 - 30 sec ) in order
to prevent lowering NameNode's speed. Because NameNode holds deleted blocks in
its datastructures across block reports.
> Not all Datastructures are updated when a block is deleted
> ----------------------------------------------------------
>
> Key: HADOOP-652
> URL: http://issues.apache.org/jira/browse/HADOOP-652
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Reporter: Raghu Angadi
>
> Currently when a block is deleted, DataNode just deletes the physical file
> and updates its map. We need to update more things. For e.g. numBlocks in
> FSDir is not decremented.. effect of this would be that we will create more
> subdirectories than necessary. It might not show up badly yet since numBlocks
> gets correct value when the dataNode restarts. I have to see what else needs
> to be updated.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira