Re: Map-Reduce in memory

Michel Segel Fri, 04 Nov 2011 04:24:58 -0700

Hi, 
First, you have 8 physical cores. Hyper threading makes the machine think that 
it has 16. The trouble is that you really don't have 16 cores so you need to be 
a little more conservative.


You don't mention HBase, so I'm going to assume that you don't have it 
installed.
So in terms of tasks, allocate a core each to DN and TT leaving 6 cores or 12 
hyper threaded cores. This leaves a little headroom for the other linux 
processes...

Now you can split the number of remaining cores however you want.
You can even overlap a bit since you are not going to be running all of your 
reducers at the same time.
So let's say 10 mappers and the 4 reducers to start.

Since you have all that memory, you can bump up you DN and TT allocations.

W ith respect to your tuning... You need to change them one at a time...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 4, 2011, at 1:46 AM, "N.N. Gesli" <[email protected]> wrote:

> Thank you very much for your replies.
> 
> Michel, disk is 3TB (6x550GB, 50 GB from each disk is reserved for local 
> basically for mapred.local.dir). You are right on the CPU; it is 8 core but 
> shows as 16. Is that mean it can handle 16 JVMs at a time? CPU is a little 
> overloaded, but that is not a huge problem at this point.
> 
> I made io.sort.factor 200 and io.sort.mb 2000. Still got the same 
> error/timeout. I played with all related conf settings one by one. At last, 
> changing mapred.job.shuffle.merge.percent from 1.0 back to 0.66 solved the 
> problem.
> 
> However, the job is still taking long time. There are 84 reducers, but only 
> one of them takes a very long time. I attached the log file of that reduce 
> task. Majority of the data gets spilled to disk. Even if I set 
> mapred.child.java.opts to 6144, the reduce task log shows
> ShuffleRamManager: MemoryLimit=1503238528, MaxSingleShuffleLimit=375809632
> as if memory is 2GB (70% of 2GB=1503238528b). In the same log file later 
> there is also this line:
> INFO ExecReducer: maximum memory = 6414139392
> I am not using memory monitoring. Tasktrackers have this line in the log:
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is 
> disabled.
> Why is ShuffleRamManager is finding that number as if the max memory is 2GB?
> Why am I still getting that much spill even with these aggressive memory 
> settings?
> Why only one reducer taking that long?
> What else I can change to make this job processed in the memory and finish 
> faster?
> 
> Thank you.
> -N.N.Gesli
> 
> On Fri, Oct 28, 2011 at 2:14 AM, Michel Segel <[email protected]> 
> wrote:
> Uhm...
> He has plenty of memory... Depending on what sort of m/r tasks... He could 
> push it.
> Didn't say how much disk...
> 
> I wouldn't start that high... Try 10 mappers and 2. Reducers. Granted it is a 
> bit asymmetric and you can bump up the reducers...
> 
> Watch your jobs in ganglia and see what is happening...
> 
> Harsh, assuming he is using intel, each core is hyper threaded so the box 
> sees this as 2x CPUs.
> 8 cores looks like 16.
> 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Oct 28, 2011, at 3:08 AM, Harsh J <[email protected]> wrote:
> 
> > Hey N.N. Gesli,
> >
> > (Inline)
> >
> > On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli <[email protected]> wrote:
> >> Hello,
> >>
> >> We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each
> >> node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of
> >> this huge RAM and run the map-reduce jobs mostly in memory with no spill, 
> >> if
> >> possible. We use Hive for most of the processes. I have set:
> >> mapred.tasktracker.map.tasks.maximum = 16
> >> mapred.tasktracker.reduce.tasks.maximum = 8
> >
> > This is *crazy* for an 8 core machine. Try to keep M+R slots well
> > below 8 instead - You're probably CPU-thrashed in this setup once
> > large number of tasks get booted.
> >
> >> mapred.child.java.opts = 6144
> >
> > You can also raise io.sort.mb to 2000, and tweak io.sort.factor.
> >
> > The child opts raise to 6~ GB looks a bit unnecessary since most of
> > your tasks work on record basis and would not care much about total
> > RAM. Perhaps use all that RAM for a service like HBase which can
> > leverage caching nicely!
> >
> >> One of my Hive queries is producing 6 stage map-reduce jobs. On the third
> >> stage when it queries from a 200GB table, the last 14 reducers hang. I
> >> changed mapred.task.timeout to 0 to see if they really hang. It has been 5
> >> hours, so something terribly wrong in my setup. Parts of the log is below.
> >
> > It is probably just your slot settings. You may be massively
> > over-subscribing your CPU resources with 16 map task slots + 8 reduce
> > tasks slots. At worst case, it would mean 24 total JVMs competing over
> > 8 available physical processors. Doesn't make sense to me at least -
> > Make it more like 7 M / 2 R or so :)
> >
> >> My questions:
> >> * What should be my configurations to make reducers to run in the memory?
> >> * Why it keeps waiting for map outputs?
> >
> > It has to fetch map outputs to get some data to start with. And it
> > pulls the map outputs a few at a time - to not overload the network
> > during shuffle phases of several reducers across the cluster.
> >
> >> * What does it mean "dup hosts"?
> >
> > Duplicate hosts. Hosts it already knows about and has already
> > scheduled fetch work upon.
> >
> > <snip>
> >
> > --
> > Harsh J
> >
> 
> <ngesli_reduce_log.txt>

Re: Map-Reduce in memory

Reply via email to