[ 
https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803834#action_12803834
 ] 

Todd Lipcon commented on HADOOP-6502:
-------------------------------------

Actually, after thinking some more, even that wouldn't work. There's lots of 
Common code that uses ReflectionUtils.newInstance, for example the 
serialziation stuff which instantiates writables. A user is free to make their 
writables JobConfigurable, etc. I don't think there's a particularly simple 
solution here.

bq. The cache in Configuration is per-classloader. So as long as we go through 
that we should be safe.

If we make the assumption that classloaders never pick up new classes, that's 
true. But I don't think the JVM has a negative class cache, does it? That is, 
if you try to load a class when it doesn't exist, then move the class into the 
classpath and try to load again, it might pick it up.

> DistributedFileSystem#listStatus is very slow when listing a directory with a 
> size of 1300
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6502
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0, 0.22.0
>
>
> When listing a directory of around 1300 children, it takes hundreds of 
> milliseconds. It turns out the slowdowness is caused by the change made by 
> HADOOP-4187. The return value of listStatus is an array of FileStatus. When 
> deserializing each element of the array, 
> ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls 
> setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class 
> path by calling Configuration#getClassByName. Even though 
> Configuration#getClassByName tries to optimize the lookup using a cached map, 
> but since JobConf is not in the class path, so it is not in the cache. Every 
> checkup ends up calling Class.ForName which is very expensive. Deserializing 
> an array of 1300 entries requires calling of Class#ForName 1300 times!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to