[ 
http://issues.apache.org/jira/browse/HADOOP-652?page=comments#action_12448720 ] 
            
Vladimir Krokhmalyov commented on HADOOP-652:
---------------------------------------------

I create and delete a lot of files in DFS, so I see that speed of DataNode 
extremely go down!

> a) currently subdirectories are created only in the last sub directory (e.g. 
> subdir63).

Not only subdir63. After the DataNode restarts, subdirectories will be created 
in other subdirXY, because "File[] files = dir.listFiles();" in FSDir 
constructor lists subdirectories and files in arbitrary order and last 
subdirectory will be other. Two branches: subdir63 and other one. It is not 
bug. Other code process this type of tree properly.

> b) remove siblings array I think it only increase s recursion in addBlock().

Recursion is not good idea, because it very slow when DataNode stores a lot of 
blocks. I think this algorithm should be changed in the future.



Here is my tested solution of this bug:

New method clearPath() in FSDir:

>        void clearPath( File f ) {
>         if ( dir.compareTo( f ) == 0 ) numBlocks--;
>         else {
>          if ( ( siblings != null ) && ( myIdx != ( siblings.length - 1 ) ) )
>           siblings[ myIdx + 1 ].clearPath( f );
>          else if ( children != null )
>           children[ 0 ].clearPath( f );
>         }
>        }

New method clearPath() in FSVolume:

>      void clearPath( File f ) {
>          dataDir.clearPath( f );
>      }

Changes in invalidate() method in FSDataset:

<        blockMap.remove(invalidBlks[i]);

>        synchronized ( ongoingCreates ) {
>        blockMap.remove( invalidBlks[ i ] );
>             FSVolume v = volumeMap.get( invalidBlks[ i ] );
>        volumeMap.remove( invalidBlks[ i ] );
>        v.clearPath( f.getParentFile() );
>       }

And changes in getFile() method in FSDataset:

<     return blockMap.get(b);

>     synchronized ( ongoingCreates ) {
>      return blockMap.get( b );
>     }

Now I will try to create patch file properly.

P.S. Also I set dfs.blockreport.intervalMsec = 10000 ( 10 - 30 sec ) in order 
to prevent lowering NameNode's speed. Because NameNode holds deleted blocks in 
its datastructures across block reports.


> Not all Datastructures are updated when a block is deleted
> ----------------------------------------------------------
>
>                 Key: HADOOP-652
>                 URL: http://issues.apache.org/jira/browse/HADOOP-652
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Raghu Angadi
>
> Currently when a block is deleted, DataNode just deletes the physical file 
> and updates its map. We need to update more things. For e.g. numBlocks in 
> FSDir is not decremented.. effect of this would be that we will create more 
> subdirectories than necessary. It might not show up badly yet since numBlocks 
> gets correct value when the dataNode restarts. I have to see what else needs 
> to be updated.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to