[ 
https://issues.apache.org/jira/browse/HADOOP-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528322#comment-16528322
 ] 

Steve Loughran commented on HADOOP-15547:
-----------------------------------------

Patch 004; other than a comment and some IDE-related cleanups, no production 
changes

Test side
# you can configure number of threads and files/thread on each. Better on a 
smaller box to have fewer threads doing more work than many fighting for 
scheduling, though I'm curious now how much blocking is going on over HTTP (and 
what pool sizes there should be)
# if the first test skips as the test dir is there, raises an assume exception, 
so is reported as such
# listing test also runs the listFiles() true to see if it returns the same 
values, also logs timings. If/when the wasb connector goes to an optimised 
recursive listing, this will be a regression test.
# new test to do the delete. Logs the time & sets the # of threads to 16. Takes 
~60s for me, BTW.

To set the threads and files, just set the props on the maven command line
{code}
-Dfs.azure.scale.test.list.performance.threads=2 
-Dfs.azure.scale.test.list.performance.files=4
{code}

Tested: Azure ireland

> WASB: listStatus performance
> ----------------------------
>
>                 Key: HADOOP-15547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15547
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 2.9.1, 3.0.2
>            Reporter: Thomas Marquardt
>            Assignee: Thomas Marquardt
>            Priority: Major
>         Attachments: HADOOP-15547-004.patch, HADOOP-15547-004.patch, 
> HADOOP-15547.001.patch, HADOOP-15547.002.patch, HADOOP-15547.003.patch
>
>
> The WASB implementation of Filesystem.listStatus is very slow due to O(n!) 
> algorithm to remove duplicates and uses too much memory due to the extra 
> conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 
> minutes to list 700,000 files.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to