[ 
https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838569#action_12838569
 ] 

Suresh Srinivas commented on HDFS-985:
--------------------------------------

Initial review from going through half the patch:
# DFSClient.java - instead of lastReturnedName we could use a generic name 
startFrom and update the param doc appropriately.
# Is the name PartialFileStatus better than PathPartialListing?
# DFSClient.listStatus() - result should be null in case the directory is 
deleted midway, isntead of returning what is accumulated until then. Number of 
lines in the code can be reduced by folding all the code into do-while. 
# DFSClient.listStatus() - document calling with name=EMPTY_NAME the first time.
# FsDirectory.getListing - avoid startChild+1 in the loop.
# INodeDirectory.nextChild() - instead of checking for name.length == 0, we 
should compare it with EMPTY_NAME.
# Why is older variant of getListing in FsNameSystem, NameNode (did not check 
if there are others) not removed? It seems to be removed in ClientProtocol.java

I will post the comments for the rest of the code soon.


> HDFS should issue multiple RPCs for listing a large directory
> -------------------------------------------------------------
>
>                 Key: HDFS-985
>                 URL: https://issues.apache.org/jira/browse/HDFS-985
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>
>         Attachments: iterativeLS_yahoo.patch
>
>
> Currently HDFS issues one RPC from the client to the NameNode for listing a 
> directory. However some directories are large that contain thousands or 
> millions of items. Listing such large directories in one RPC has a few 
> shortcomings:
> 1. The list operation holds the global fsnamesystem lock for a long time thus 
> blocking other requests. If a large number (like thousands) of such list 
> requests hit NameNode in a short period of time, NameNode will be 
> significantly slowed down. Users end up noticing longer response time or lost 
> connections to NameNode.
> 2. The response message is uncontrollable big. We observed a response as big 
> as 50M bytes when listing a directory of 300 thousand items. Even with the 
> optimization introduced at HDFS-946 that may be able to cut the response by 
> 20-50%, the response size will still in the magnitude of 10 mega bytes.
> I propose to implement a directory listing using multiple RPCs. Here is the 
> plan:
> 1. Each getListing RPC has an upper limit on the number of items returned.  
> This limit could be configurable, but I am thinking to set it to be a fixed 
> number like 500.
> 2. Each RPC additionally specifies a start position for this listing request. 
> I am thinking to use the last item of the previous listing RPC as an 
> indicator. Since NameNode stores all items in a directory as a sorted array, 
> NameNode uses the last item to locate the start item of this listing even if 
> the last item is deleted in between these two consecutive calls. This has the 
> advantage of avoid duplicate entries at the client side.
> 3. The return value additionally specifies if the whole directory is done 
> listing. If the client sees a false flag, it will continue to issue another 
> RPC.
> This proposal will change the semantics of large directory listing in a sense 
> that listing is no longer an atomic operation if a directory's content is 
> changing while the listing operation is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to