[
https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850299#action_12850299
]
Hairong Kuang commented on HDFS-985:
------------------------------------
I performed some experiments to test the overhead of iterative listing. The
experiments were performed on a NameNode with no traffic with security
disabled. The client listed the directory for 200 times sequentially and the
table below shows the average time for listing all entries of a directory. When
the max # of returned entries per call is 1,000, this means that each directory
listing requires multiple RPC calls to NameNode. In the case that max # of
returned entries is 10,000, each directory listing requires only one RPC call.
||Max # of returned entries per getListing RPC||Directory of 2,000
entries||Directory of 4,000 entries||Directory of 10,000 entries|
|1,000|71.86ms|145.88ms|343.04ms|
|10,000|70.22ms|165.66ms| 332.1ms|
> HDFS should issue multiple RPCs for listing a large directory
> -------------------------------------------------------------
>
> Key: HDFS-985
> URL: https://issues.apache.org/jira/browse/HDFS-985
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: directoryBrowse_0.20yahoo.patch,
> directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch,
> iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch,
> iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch,
> iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch
>
>
> Currently HDFS issues one RPC from the client to the NameNode for listing a
> directory. However some directories are large that contain thousands or
> millions of items. Listing such large directories in one RPC has a few
> shortcomings:
> 1. The list operation holds the global fsnamesystem lock for a long time thus
> blocking other requests. If a large number (like thousands) of such list
> requests hit NameNode in a short period of time, NameNode will be
> significantly slowed down. Users end up noticing longer response time or lost
> connections to NameNode.
> 2. The response message is uncontrollable big. We observed a response as big
> as 50M bytes when listing a directory of 300 thousand items. Even with the
> optimization introduced at HDFS-946 that may be able to cut the response by
> 20-50%, the response size will still in the magnitude of 10 mega bytes.
> I propose to implement a directory listing using multiple RPCs. Here is the
> plan:
> 1. Each getListing RPC has an upper limit on the number of items returned.
> This limit could be configurable, but I am thinking to set it to be a fixed
> number like 500.
> 2. Each RPC additionally specifies a start position for this listing request.
> I am thinking to use the last item of the previous listing RPC as an
> indicator. Since NameNode stores all items in a directory as a sorted array,
> NameNode uses the last item to locate the start item of this listing even if
> the last item is deleted in between these two consecutive calls. This has the
> advantage of avoid duplicate entries at the client side.
> 3. The return value additionally specifies if the whole directory is done
> listing. If the client sees a false flag, it will continue to issue another
> RPC.
> This proposal will change the semantics of large directory listing in a sense
> that listing is no longer an atomic operation if a directory's content is
> changing while the listing operation is in progress.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.