Re: Change in JobClient behavior may not be ideal

Owen O'Malley Thu, 08 Mar 2007 23:07:41 -0800


On Mar 8, 2007, at 4:29 PM, R. James Firby wrote:

However, now the JobClient computes the task splits at the centralpointrather than at the JobTracker. That step involves looking up thedefaultnumber of mapred tasks in the cluster configuration (ie.mapred.map.tasks).But, unfortunately, the cluster configuration isn't available wherewe arerunning the JobClient, it is available at the cluster. In the pastthisdidn't matter because all the JobClient really needed from theconfiguration
was communication information.

The computation of the splits was moved from the job tracker to theclient, to offload the job tracker and more importantly to remove theneed to load the user code in the job tracker.

I agree that since the cluster size and composition are defined bythe cluster, it would make sense to pass back the capacity of thecluster via the JobSubmissionProtocol like the name of the defaultfile system is. (I created HADOOP-1100.) I would pull out the defaultvalues out of hadoop-default.xml for mapred.{map,reduce}.tasks andhave JobConf return a number based on the cluster capacity if theuser hasn't given a specific value.

In addition, doing the splits in the JobClient lets a locally set
mapred.map.tasks value override the value set in hadoop-site.xml onthe
cluster, which seems like a bug.

Once the input splits are generated, the number of splits defines thenumber of maps. In my opinion, it is far less confusing to the usersto have conf.getNumMapTasks() return the real number of maps ratherthan the original hint.


-- Owen

Re: Change in JobClient behavior may not be ideal

Reply via email to