Uhm... He has plenty of memory... Depending on what sort of m/r tasks... He could push it. Didn't say how much disk...
I wouldn't start that high... Try 10 mappers and 2. Reducers. Granted it is a bit asymmetric and you can bump up the reducers... Watch your jobs in ganglia and see what is happening... Harsh, assuming he is using intel, each core is hyper threaded so the box sees this as 2x CPUs. 8 cores looks like 16. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 28, 2011, at 3:08 AM, Harsh J <[email protected]> wrote: > Hey N.N. Gesli, > > (Inline) > > On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli <[email protected]> wrote: >> Hello, >> >> We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each >> node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of >> this huge RAM and run the map-reduce jobs mostly in memory with no spill, if >> possible. We use Hive for most of the processes. I have set: >> mapred.tasktracker.map.tasks.maximum = 16 >> mapred.tasktracker.reduce.tasks.maximum = 8 > > This is *crazy* for an 8 core machine. Try to keep M+R slots well > below 8 instead - You're probably CPU-thrashed in this setup once > large number of tasks get booted. > >> mapred.child.java.opts = 6144 > > You can also raise io.sort.mb to 2000, and tweak io.sort.factor. > > The child opts raise to 6~ GB looks a bit unnecessary since most of > your tasks work on record basis and would not care much about total > RAM. Perhaps use all that RAM for a service like HBase which can > leverage caching nicely! > >> One of my Hive queries is producing 6 stage map-reduce jobs. On the third >> stage when it queries from a 200GB table, the last 14 reducers hang. I >> changed mapred.task.timeout to 0 to see if they really hang. It has been 5 >> hours, so something terribly wrong in my setup. Parts of the log is below. > > It is probably just your slot settings. You may be massively > over-subscribing your CPU resources with 16 map task slots + 8 reduce > tasks slots. At worst case, it would mean 24 total JVMs competing over > 8 available physical processors. Doesn't make sense to me at least - > Make it more like 7 M / 2 R or so :) > >> My questions: >> * What should be my configurations to make reducers to run in the memory? >> * Why it keeps waiting for map outputs? > > It has to fetch map outputs to get some data to start with. And it > pulls the map outputs a few at a time - to not overload the network > during shuffle phases of several reducers across the cluster. > >> * What does it mean "dup hosts"? > > Duplicate hosts. Hosts it already knows about and has already > scheduled fetch work upon. > > <snip> > > -- > Harsh J >
