Being able to set default job configuration values on the jobtracker
--------------------------------------------------------------------

                 Key: HADOOP-3287
                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
             Project: Hadoop Core
          Issue Type: Bug
          Components: conf, mapred
         Environment: all
            Reporter: Alejandro Abdelnur
            Priority: Critical


The jobtracker hadoop-site.xml carries custom configuration for the cluster and 
the 'final' flag allows to fix a value ignoring any override by a client when 
submitting a job.

There are several properties for which a cluster may want to set some default 
values (different from the ones in the hadoop-default.xml), for example:

 * enabling/disabling compression
 * type of compression, record/block
 * number of task retries
 * block replication factor
 * job priority
 * tasks JVM options

The cluster default values should apply to submitted jobs when the job 
submitter does not care about those values. When the job submitter cares, it 
should include its preferred values. Using the final flag on the jobtracker 
hadoop-site.xml will lock the value ignoring the value set in the client 
jobconf.

Currently the only way of doing this is to distribute the jobtracker 
hadoop-site.xml to all clients and make sure they use it when creating the job 
configuration.

There are situations where this is not practical:

 * In a shared cluster with several clients submitting jobs. It requires 
redistributing the hadoop-site.xml to all clients.
 * In a cluster where the jobs are dispatched by a webapp application. It 
requires rebundling and redeploying the webapp.

The current behavior happens because the jobconf when serialized, to be sent to 
the jobtracker, sends all the values found in the hadoop-default.xml bundled 
with the hadoop JAR file. On the jobtracker side, all those values override all 
but the 'final' properties of the jobtracker hadoop-site.xml.

According to the javadocs of the Configuration.write(OutpuStream) this should 
not happen ' Writes non-default properties in this configuration.'

If taken the javadocs as the proper behavior this is a bug in the current 
implementation and it could be easily fixed by avoiding writing default values 
on write.

This is a generalization of the problem mentioned in Hadoop-3171.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to