[
https://issues.apache.org/jira/browse/HADOOP-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elek, Marton reassigned HADOOP-16064:
-------------------------------------
Assignee: Elek, Marton
> Load configuration values from external sources
> -----------------------------------------------
>
> Key: HADOOP-16064
> URL: https://issues.apache.org/jira/browse/HADOOP-16064
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Elek, Marton
> Assignee: Elek, Marton
> Priority: Major
>
> This is a proposal to improve the Configuration.java to load configuration
> from external sources (kubernetes config map, external http reqeust, any
> cluster manager like ambari, etc.)
> I will attach a patch to illustrate the proposed solution, but please comment
> the concept first, the patch is just poc and not fully implemented.
> *Goals:*
> * **Load the configuration files (core-site.xml/hdfs-site.xml/...) from
> external locations instead of the classpath (classpath remains the default)
> * Make the configuration loading extensible
> * Make it in an backward-compatible way with minimal change in the existing
> Configuration.java
> *Use-cases:*
> 1.) load configuration from the namenode ([http://namenode:9878/conf]). With
> this approach only the namenode should be configured, other components
> require only the url of the namenode
> 2.) Read configuration directly from kubernetes config-map (or mesos)
> 3.) Read configuration from any external cluster management (such as Apache
> Ambari or any equivalent)
> 4.) as of now in the hadoop docker images we transform environment variables
> (such as HDFS-SITE.XML_fs.defaultFs) to configuration xml files with the help
> of a python script. With the proposed implementation it would be possible to
> read the configuration directly from the system environment variables.
> *Problem:*
> The existing Configuration.java can read configuration from multiple sources.
> But most of the time it's used to load predefined config names
> ("core-site.xml" and "hdfs-site.xml") without configuration location. In this
> case the files will be loaded from the classpath.
> I propose to add additional option to define the default location of
> core-site.xml and hdfs-site.xml (any configuration which is defined by string
> name) to use external sources in the classpath.
> The configuration loading requires implementation + configuration (where are
> the external configs). We can't use regular configuration to configure the
> config loader (chicken/egg).
> I propose to use a new environment variable HADOOP_CONF_SOURCE
> The environment variable could contain a URL, where the schema of the url can
> define the config source and all the other parts can configure the access to
> the resource.
> Examples:
> HADOOP_CONF_SOURCE=hadoop-[http://namenode:9878/conf]
> HADOOP_CONF_SOURCE=env://prefix
> HADOOP_CONF_SOURCE=k8s://config-map-name
> The ConfigurationSource interface can be as easy as:
> {code:java}
> /**
> * Interface to load hadoop configuration from custom location.
> */
> public interface ConfigurationSource {
> /**
> * Method will be called one with the defined configuration url.
> *
> * @param uri
> */
> void initialize(URI uri) throws IOException;
> /**
> * Method will be called to load a specific configuration resource.
> *
> * @param name of the configuration resource (eg. hdfs-site.xml)
> * @return List of loaded configuraiton key and values.
> */
> List<ParsedItem> readConfiguration(String name);
> }{code}
> We can choose the right implementation based the schema of the uri and with
> Java Service Provider Interface mechanism
> (META-INF/services/org.apache.hadoop.conf.ConfigurationSource)
> It could be with minimal modification in the Configuration.java (see the
> attached patch as an example)
> The patch contains two example implementation:
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/Env.java*
> This can load configuration from environment variables based on a naming
> convention (eg. HDFS-SITE.XML_hdfs.dfs.key=value)
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/HadoopWeb.java*
> This implementation can load the configuration from a /conf servlet of any
> Hadoop components.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]