[
https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207442#comment-13207442
]
Todd Lipcon commented on HADOOP-6502:
-------------------------------------
I believe this is still an issue in trunk, since the protobufs are still
tunneled over a Writable-based mechanism. I see the following trace in an IPC
benchmark I'm working on:
{code}
"IPC Client (1065524847) connection to /127.0.0.1:12345 from todd" daemon
prio=10 tid=0x000000000250e000 nid=0x3dba runnable [0x00007f96164f0000]
java.lang.Thread.State: RUNNABLE
at java.util.zip.ZipFile.getEntry(Native Method)
at java.util.zip.ZipFile.getEntry(ZipFile.java:166)
- locked <0x00000007840bb5b0> (a java.util.jar.JarFile)
at java.util.jar.JarFile.getEntry(JarFile.java:223)
at java.util.jar.JarFile.getJarEntry(JarFile.java:206)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:771)
at sun.misc.URLClassPath.getResource(URLClassPath.java:185)
at java.net.URLClassLoader$1.run(URLClassLoader.java:209)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
- locked <0x0000000784000150> (a sun.misc.Launcher$AppClassLoader)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
- locked <0x0000000784000150> (a sun.misc.Launcher$AppClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1162)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:89)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:835)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:762)
{code}
> DistributedFileSystem#listStatus is very slow when listing a directory with a
> size of 1300
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-6502
> URL: https://issues.apache.org/jira/browse/HADOOP-6502
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 0.20.0
> Reporter: Hairong Kuang
> Priority: Critical
> Attachments: 6502.patch, 6502_v2.patch
>
>
> When listing a directory of around 1300 children, it takes hundreds of
> milliseconds. It turns out the slowdowness is caused by the change made by
> HADOOP-4187. The return value of listStatus is an array of FileStatus. When
> deserializing each element of the array,
> ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls
> setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class
> path by calling Configuration#getClassByName. Even though
> Configuration#getClassByName tries to optimize the lookup using a cached map,
> but since JobConf is not in the class path, so it is not in the cache. Every
> checkup ends up calling Class.ForName which is very expensive. Deserializing
> an array of 1300 entries requires calling of Class#ForName 1300 times!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira