[
https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208678#comment-13208678
]
Hudson commented on HADOOP-6502:
--------------------------------
Integrated in Hadoop-Mapreduce-0.23-Commit #557 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/557/])
HADOOP-6502. Improve the performance of Configuration.getClassByName when
the class is not found by caching negative results. Contributed by Sharad
Agarwal and Todd Lipcon. (Revision 1244619)
Result = ABORTED
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1244619
Files :
*
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
*
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
*
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ReflectionUtils.java
> DistributedFileSystem#listStatus is very slow when listing a directory with a
> size of 1300
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-6502
> URL: https://issues.apache.org/jira/browse/HADOOP-6502
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 0.20.0
> Reporter: Hairong Kuang
> Assignee: Sharad Agarwal
> Priority: Critical
> Fix For: 0.24.0, 0.23.2
>
> Attachments: 6502.patch, 6502_v2.patch, hadoop-6502-trunk.txt,
> hadoop-6502-trunk.txt
>
>
> When listing a directory of around 1300 children, it takes hundreds of
> milliseconds. It turns out the slowdowness is caused by the change made by
> HADOOP-4187. The return value of listStatus is an array of FileStatus. When
> deserializing each element of the array,
> ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls
> setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class
> path by calling Configuration#getClassByName. Even though
> Configuration#getClassByName tries to optimize the lookup using a cached map,
> but since JobConf is not in the class path, so it is not in the cache. Every
> checkup ends up calling Class.ForName which is very expensive. Deserializing
> an array of 1300 entries requires calling of Class#ForName 1300 times!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira