Hey N.N. Gesli, (Inline)
On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli <[email protected]> wrote: > Hello, > > We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each > node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of > this huge RAM and run the map-reduce jobs mostly in memory with no spill, if > possible. We use Hive for most of the processes. I have set: > mapred.tasktracker.map.tasks.maximum = 16 > mapred.tasktracker.reduce.tasks.maximum = 8 This is *crazy* for an 8 core machine. Try to keep M+R slots well below 8 instead - You're probably CPU-thrashed in this setup once large number of tasks get booted. > mapred.child.java.opts = 6144 You can also raise io.sort.mb to 2000, and tweak io.sort.factor. The child opts raise to 6~ GB looks a bit unnecessary since most of your tasks work on record basis and would not care much about total RAM. Perhaps use all that RAM for a service like HBase which can leverage caching nicely! > One of my Hive queries is producing 6 stage map-reduce jobs. On the third > stage when it queries from a 200GB table, the last 14 reducers hang. I > changed mapred.task.timeout to 0 to see if they really hang. It has been 5 > hours, so something terribly wrong in my setup. Parts of the log is below. It is probably just your slot settings. You may be massively over-subscribing your CPU resources with 16 map task slots + 8 reduce tasks slots. At worst case, it would mean 24 total JVMs competing over 8 available physical processors. Doesn't make sense to me at least - Make it more like 7 M / 2 R or so :) > My questions: > * What should be my configurations to make reducers to run in the memory? > * Why it keeps waiting for map outputs? It has to fetch map outputs to get some data to start with. And it pulls the map outputs a few at a time - to not overload the network during shuffle phases of several reducers across the cluster. > * What does it mean "dup hosts"? Duplicate hosts. Hosts it already knows about and has already scheduled fetch work upon. <snip> -- Harsh J
