On Thu, Jan 03, 2008 at 10:12:04AM +0530, Arun C Murthy wrote: >On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote: >>In our case, we have specific jobs that due to resource constraints can >>only be run serially (ie: 1 instance per machine). > >I see, at this point there isn't anything in Hadoop which can help you out >here... >
Given that, please file a jira for this enhancement anyway... Thanks! I'd imagine we should consider features such as: a) Max simultaneous tasks per node per job (current ask). b) Max concurrent tasks per job cluster-wide (i.e. don't run more than 25, or an absolute number, of maps of a given job simultaneously on the cluster) - this should help jobs which need to respect SLAs of external services regardless of cluster sizes - don't open more than 150 simultaneous db-connections. Arun >Having said that, could you consider running another Map-Reduce cluster with >mapred.tasktracker.map.tasks.maximum set to 1 for these special jobs? >Run this cluster on the same machines simultaneously with the your _regular_ >cluster; just pick different ports etc. > >hth, >Arun > >>Most of our jobs are more normal and can be run in parallel on the machines. >> >>Arun C Murthy wrote: >>>Billy, >>> >>> >>>On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote: >>> >>>>If I add this to a command line as a -jobconf should it be enforced? >>>> >>>> >>> >>>This is a property of the TaskTracker and hence cannot be set on a per-job >>>basis... >>> >>> >>>>Say I have a job that I want to run only 1 map at a time per server >>>> >>>> >>> >>>Could you describe your reasons? >>> >>>Arun >>> >>> >>>>I have tried this and look in the job.xml file and its set correctly but >>>>not enforced. >>>> >>>> >>> >>> >>> >>>>Billy >>>> >>>> >>>> >>>>