[
https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591059#action_12591059
]
Owen O'Malley commented on HADOOP-3287:
---------------------------------------
-1 As I wrote on HADOOP-3171, these semantics lead to very hard to debug cases.
In particular, what was happening was:
client:
conf 1
job tracker:
conf 2
task tracker:
conf 3..n
and depending on which part of the framework looked at the particular value,
they would take the value from conf 1..n. It was *very* difficult to debug and
lead to wasted days of developer time.
> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
> Key: HADOOP-3287
> URL: https://issues.apache.org/jira/browse/HADOOP-3287
> Project: Hadoop Core
> Issue Type: Bug
> Components: conf, mapred
> Environment: all
> Reporter: Alejandro Abdelnur
> Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster
> and the 'final' flag allows to fix a value ignoring any override by a client
> when submitting a job.
> There are several properties for which a cluster may want to set some default
> values (different from the ones in the hadoop-default.xml), for example:
> * enabling/disabling compression
> * type of compression, record/block
> * number of task retries
> * block replication factor
> * job priority
> * tasks JVM options
> The cluster default values should apply to submitted jobs when the job
> submitter does not care about those values. When the job submitter cares, it
> should include its preferred values. Using the final flag on the jobtracker
> hadoop-site.xml will lock the value ignoring the value set in the client
> jobconf.
> Currently the only way of doing this is to distribute the jobtracker
> hadoop-site.xml to all clients and make sure they use it when creating the
> job configuration.
> There are situations where this is not practical:
> * In a shared cluster with several clients submitting jobs. It requires
> redistributing the hadoop-site.xml to all clients.
> * In a cluster where the jobs are dispatched by a webapp application. It
> requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent
> to the jobtracker, sends all the values found in the hadoop-default.xml
> bundled with the hadoop JAR file. On the jobtracker side, all those values
> override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should
> not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current
> implementation and it could be easily fixed by avoiding writing default
> values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.