[ 
https://issues.apache.org/jira/browse/HADOOP-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528280#comment-16528280
 ] 

Steve Loughran commented on HADOOP-15547:
-----------------------------------------

I'm testing this, patch will have some fixes to make the default test thread & 
file size smaller. 
One thing I want to highlight is that this connector isn't good at handling bad 
configs, in particular, UnknownHostException is considered retriable. I have 
had to turn off all retries & backoff intervals to 
begin debugging why my test is hanging. If I can't get the tests to fail 
properly on unrecoverable exceptions, it's not going to be a good experience in 
the field.

{code}
[ERROR] 
test_0200_ListStatusPerformance(org.apache.hadoop.fs.azure.ITestListPerformance)
  Time elapsed: 0.332 s  <<< ERROR!
org.apache.hadoop.fs.azure.AzureException: 
com.microsoft.azure.storage.StorageException: 
        at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2152)
        at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.listStatus(NativeAzureFileSystem.java:2756)
        at 
org.apache.hadoop.fs.azure.ITestListPerformance.test_0200_ListStatusPerformance(ITestListPerformance.java:131)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: com.microsoft.azure.storage.StorageException: 
        at 
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87)
        at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:209)
        at 
com.microsoft.azure.storage.blob.CloudBlobContainer.downloadAttributes(CloudBlobContainer.java:570)
        at 
org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.downloadAttributes(StorageInterfaceImpl.java:255)
        at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.checkContainer(AzureNativeFileSystemStore.java:1279)
        at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2068)
        ... 14 more
Caused by: java.net.UnknownHostException: somehosthere
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
        at sun.net.www.http.HttpClient.New(HttpClient.java:308)
        at sun.net.www.http.HttpClient.New(HttpClient.java:326)
        at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
        at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
        at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
        at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:115)
        ... 18 more



> WASB: listStatus performance
> ----------------------------
>
>                 Key: HADOOP-15547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15547
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 2.9.1, 3.0.2
>            Reporter: Thomas Marquardt
>            Assignee: Thomas Marquardt
>            Priority: Major
>         Attachments: HADOOP-15547.001.patch, HADOOP-15547.002.patch, 
> HADOOP-15547.003.patch
>
>
> The WASB implementation of Filesystem.listStatus is very slow due to O(n!) 
> algorithm to remove duplicates and uses too much memory due to the extra 
> conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 
> minutes to list 700,000 files.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to