[ 
https://issues.apache.org/jira/browse/HADOOP-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828293#comment-15828293
 ] 

Jason Lowe commented on HADOOP-11223:
-------------------------------------

bq. If the real problem is reloading all those XML files all the time, why not 
change that behavior in Hadoop 3.x? At the very least, we could have some kind 
of mapping between classpath and default configuration values, and only 
actually load the XML files when we saw a new classpath which might cause us to 
load some different files.

That's an interesting idea.  Tackling the *-default.xml files would get us a 
long way since hopefully we can not only avoid parsing them for new 
Configuration objects but also avoid invalidating them in every existing 
Configuration object every time a new default resource is added.  There'd still 
be the parsing of *-site.xml files which can also be expensive.  We'd have to 
not only snapshot the classpath but also sizes and modification timestamps of 
the relevant resources located on that classpath if we wanted to apply a 
similar approach to those.


> Offer a read-only conf alternative to new Configuration()
> ---------------------------------------------------------
>
>                 Key: HADOOP-11223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11223
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>            Reporter: Gopal V
>            Assignee: Varun Saxena
>              Labels: Performance
>         Attachments: HADOOP-11223.001.patch
>
>
> new Configuration() is called from several static blocks across Hadoop.
> This is incredibly inefficient, since each one of those involves primarily 
> XML parsing at a point where the JIT won't be triggered & interpreter mode is 
> essentially forced on the JVM.
> The alternate solution would be to offer a {{Configuration::getDefault()}} 
> alternative which disallows any modifications.
> At the very least, such a method would need to be called from 
> # org.apache.hadoop.io.nativeio.NativeIO::<clinit>()
> # org.apache.hadoop.security.SecurityUtil::<clinit>()
> # org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider::<clinit>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to