[jira] Commented: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Owen O'Malley (JIRA) Mon, 21 Apr 2008 14:08:49 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591059#action_12591059
 ]


Owen O'Malley commented on HADOOP-3287:
---------------------------------------

-1 As I wrote on HADOOP-3171, these semantics lead to very hard to debug cases. 
In particular, what was happening was:

client:
  conf 1

job tracker:
  conf 2

task tracker:
  conf 3..n

and depending on which part of the framework looked at the particular value, 
they would take the value from conf 1..n. It was *very* difficult to debug and 
lead to wasted days of developer time.

> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf, mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster 
> and the 'final' flag allows to fix a value ignoring any override by a client 
> when submitting a job.
> There are several properties for which a cluster may want to set some default 
> values (different from the ones in the hadoop-default.xml), for example:
>  * enabling/disabling compression
>  * type of compression, record/block
>  * number of task retries
>  * block replication factor
>  * job priority
>  * tasks JVM options
> The cluster default values should apply to submitted jobs when the job 
> submitter does not care about those values. When the job submitter cares, it 
> should include its preferred values. Using the final flag on the jobtracker 
> hadoop-site.xml will lock the value ignoring the value set in the client 
> jobconf.
> Currently the only way of doing this is to distribute the jobtracker 
> hadoop-site.xml to all clients and make sure they use it when creating the 
> job configuration.
> There are situations where this is not practical:
>  * In a shared cluster with several clients submitting jobs. It requires 
> redistributing the hadoop-site.xml to all clients.
>  * In a cluster where the jobs are dispatched by a webapp application. It 
> requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent 
> to the jobtracker, sends all the values found in the hadoop-default.xml 
> bundled with the hadoop JAR file. On the jobtracker side, all those values 
> override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should 
> not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current 
> implementation and it could be easily fixed by avoiding writing default 
> values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Reply via email to