On Thu, Jan 03, 2008 at 10:12:04AM +0530, Arun C Murthy wrote:
>On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote:
>>In our case, we have specific jobs that due to resource constraints can 
>>only be run serially (ie: 1 instance per machine).
>
>I see, at this point there isn't anything in Hadoop which can help you out 
>here...
>

Given that, please file a jira for this enhancement anyway... Thanks! 

I'd imagine we should consider features such as:
a) Max simultaneous tasks per node per job (current ask).
b) Max concurrent tasks per job cluster-wide (i.e. don't run more than 25, or 
an absolute number, of maps of a given job simultaneously on the cluster) - 
this should help jobs which need to respect SLAs of external services 
regardless of cluster sizes - don't open more than 150 simultaneous 
db-connections.

Arun

>Having said that, could you consider running another Map-Reduce cluster with 
>mapred.tasktracker.map.tasks.maximum set to 1 for these special jobs?
>Run this cluster on the same machines simultaneously with the your _regular_ 
>cluster; just pick different ports etc.
>
>hth,
>Arun
>
>>Most of our jobs are more normal and can be run in parallel on the machines.
>>
>>Arun C Murthy wrote:
>>>Billy,
>>>
>>>
>>>On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>>>  
>>>>If I add this to a command line as a -jobconf should it be enforced?
>>>>
>>>>    
>>>
>>>This is a property of the TaskTracker and hence cannot be set on a per-job 
>>>basis...
>>>
>>>  
>>>>Say I have a job that I want to run only 1 map at a time per server
>>>>
>>>>    
>>>
>>>Could you describe your reasons?
>>>
>>>Arun
>>>
>>>  
>>>>I have tried this and look in the job.xml file and its set correctly but 
>>>>not enforced.
>>>>
>>>>    
>>>
>>>
>>>  
>>>>Billy
>>>>
>>>>
>>>>
>>>>    

Reply via email to