An alternative is to have 2 Tasktracker clusters, where the nodes are on the
same machines.
One cluster is for IO intensive jobs and has a low number of map/reduces per
tracker,
the other cluster is for cpu intensive jobs and has a high number of
map/reduces per tracker.

The alternative, simpler method is to use a multi-threaded mapper on the cpu
intensive jobs, where you tune the thread count on a per job basis.

In the longer term being able to alter the Tasktracker control parameters at
run time on a per job basis would be wonderful.

On Tue, Feb 3, 2009 at 12:01 PM, Nathan Marz <nat...@rapleaf.com> wrote:

> This is a great idea. For me, this is related to:
> https://issues.apache.org/jira/browse/HADOOP-5160. Being able to set the
> number of tasks per machine on a job by job basis would allow me to solve my
> problem in a different way. Looking at the Hadoop source, it's also probably
> simpler than changing how Hadoop schedules tasks.
>
>
>
>
>
> On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:
>
>  Chris,
>>
>> For my specific use cases, it would be best to be able to set N
>> mappers/reducers per job per node (so I can explicitly say, run at most 2
>> at
>> a time of this CPU bound task on any given node).  However, the other way
>> would work as well (on 10 node system, would set job to max 20 tasks at a
>> time globally), but opens up the possibility that a node could be assigned
>> more than 2 of that task.
>>
>> I would work with whatever is easiest to implement as either would be a
>> vast
>> improvement for me (can run high numbers of network latency bound tasks
>> without fear of cpu bound tasks killing the cluster).
>>
>> JG
>>
>>
>>
>>  -----Original Message-----
>>> From: Chris K Wensel [mailto:ch...@wensel.net]
>>> Sent: Tuesday, February 03, 2009 11:34 AM
>>> To: core-user@hadoop.apache.org
>>> Subject: Re: Control over max map/reduce tasks per job
>>>
>>> Hey Jonathan
>>>
>>> Are you looking to limit the total number of concurrent mapper/
>>> reducers a single job can consume cluster wide, or limit the number
>>> per node?
>>>
>>> That is, you have X mappers/reducers, but only can allow N mappers/
>>> reducers to run at a time globally, for a given job.
>>>
>>> Or, you are cool with all X running concurrently globally, but want to
>>> guarantee that no node can run more than N tasks from that job?
>>>
>>> Or both?
>>>
>>> just reconciling the conversation we had last week with this thread.
>>>
>>> ckw
>>>
>>> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:
>>>
>>>  All,
>>>>
>>>>
>>>>
>>>> I have a few relatively small clusters (5-20 nodes) and am having
>>>> trouble
>>>> keeping them loaded with my MR jobs.
>>>>
>>>>
>>>>
>>>> The primary issue is that I have different jobs that have drastically
>>>> different patterns.  I have jobs that read/write to/from HBase or
>>>> Hadoop
>>>> with minimal logic (network throughput bound or io bound), others
>>>>
>>> that
>>>
>>>> perform crawling (network latency bound), and one huge parsing
>>>> streaming job
>>>> (very CPU bound, each task eats a core).
>>>>
>>>>
>>>>
>>>> I'd like to launch very large numbers of tasks for network latency
>>>> bound
>>>> jobs, however the large CPU bound job means I have to keep the max
>>>> maps
>>>> allowed per node low enough as to not starve the Datanode and
>>>> Regionserver.
>>>>
>>>>
>>>>
>>>> I'm an HBase dev but not familiar enough with Hadoop MR code to even
>>>> know
>>>> what would be involved with implementing this.  However, in talking
>>>> with
>>>> other users, it seems like this would be a well-received option.
>>>>
>>>>
>>>>
>>>> I wanted to ping the list before filing an issue because it seems
>>>>
>>> like
>>>
>>>> someone may have thought about this in the past.
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> Jonathan Gray
>>>>
>>>>
>>> --
>>> Chris K Wensel
>>> ch...@wensel.net
>>> http://www.cascading.org/
>>> http://www.scaleunlimited.com/
>>>
>>
>>
>

Reply via email to