[ 
https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207442#comment-13207442
 ] 

Todd Lipcon commented on HADOOP-6502:
-------------------------------------

I believe this is still an issue in trunk, since the protobufs are still 
tunneled over a Writable-based mechanism. I see the following trace in an IPC 
benchmark I'm working on:
{code}
"IPC Client (1065524847) connection to /127.0.0.1:12345 from todd" daemon 
prio=10 tid=0x000000000250e000 nid=0x3dba runnable [0x00007f96164f0000]
   java.lang.Thread.State: RUNNABLE
        at java.util.zip.ZipFile.getEntry(Native Method)
        at java.util.zip.ZipFile.getEntry(ZipFile.java:166)
        - locked <0x00000007840bb5b0> (a java.util.jar.JarFile)
        at java.util.jar.JarFile.getEntry(JarFile.java:223)
        at java.util.jar.JarFile.getJarEntry(JarFile.java:206)
        at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:771)
        at sun.misc.URLClassPath.getResource(URLClassPath.java:185)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:209)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        - locked <0x0000000784000150> (a sun.misc.Launcher$AppClassLoader)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        - locked <0x0000000784000150> (a sun.misc.Launcher$AppClassLoader)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1162)
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:89)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:835)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:762)
{code}
                
> DistributedFileSystem#listStatus is very slow when listing a directory with a 
> size of 1300
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6502
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Priority: Critical
>         Attachments: 6502.patch, 6502_v2.patch
>
>
> When listing a directory of around 1300 children, it takes hundreds of 
> milliseconds. It turns out the slowdowness is caused by the change made by 
> HADOOP-4187. The return value of listStatus is an array of FileStatus. When 
> deserializing each element of the array, 
> ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls 
> setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class 
> path by calling Configuration#getClassByName. Even though 
> Configuration#getClassByName tries to optimize the lookup using a cached map, 
> but since JobConf is not in the class path, so it is not in the cache. Every 
> checkup ends up calling Class.ForName which is very expensive. Deserializing 
> an array of 1300 entries requires calling of Class#ForName 1300 times!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to