Hi, First, you have 8 physical cores. Hyper threading makes the machine think that it has 16. The trouble is that you really don't have 16 cores so you need to be a little more conservative.
You don't mention HBase, so I'm going to assume that you don't have it installed. So in terms of tasks, allocate a core each to DN and TT leaving 6 cores or 12 hyper threaded cores. This leaves a little headroom for the other linux processes... Now you can split the number of remaining cores however you want. You can even overlap a bit since you are not going to be running all of your reducers at the same time. So let's say 10 mappers and the 4 reducers to start. Since you have all that memory, you can bump up you DN and TT allocations. W ith respect to your tuning... You need to change them one at a time... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 4, 2011, at 1:46 AM, "N.N. Gesli" <[email protected]> wrote: > Thank you very much for your replies. > > Michel, disk is 3TB (6x550GB, 50 GB from each disk is reserved for local > basically for mapred.local.dir). You are right on the CPU; it is 8 core but > shows as 16. Is that mean it can handle 16 JVMs at a time? CPU is a little > overloaded, but that is not a huge problem at this point. > > I made io.sort.factor 200 and io.sort.mb 2000. Still got the same > error/timeout. I played with all related conf settings one by one. At last, > changing mapred.job.shuffle.merge.percent from 1.0 back to 0.66 solved the > problem. > > However, the job is still taking long time. There are 84 reducers, but only > one of them takes a very long time. I attached the log file of that reduce > task. Majority of the data gets spilled to disk. Even if I set > mapred.child.java.opts to 6144, the reduce task log shows > ShuffleRamManager: MemoryLimit=1503238528, MaxSingleShuffleLimit=375809632 > as if memory is 2GB (70% of 2GB=1503238528b). In the same log file later > there is also this line: > INFO ExecReducer: maximum memory = 6414139392 > I am not using memory monitoring. Tasktrackers have this line in the log: > TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is > disabled. > Why is ShuffleRamManager is finding that number as if the max memory is 2GB? > Why am I still getting that much spill even with these aggressive memory > settings? > Why only one reducer taking that long? > What else I can change to make this job processed in the memory and finish > faster? > > Thank you. > -N.N.Gesli > > On Fri, Oct 28, 2011 at 2:14 AM, Michel Segel <[email protected]> > wrote: > Uhm... > He has plenty of memory... Depending on what sort of m/r tasks... He could > push it. > Didn't say how much disk... > > I wouldn't start that high... Try 10 mappers and 2. Reducers. Granted it is a > bit asymmetric and you can bump up the reducers... > > Watch your jobs in ganglia and see what is happening... > > Harsh, assuming he is using intel, each core is hyper threaded so the box > sees this as 2x CPUs. > 8 cores looks like 16. > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Oct 28, 2011, at 3:08 AM, Harsh J <[email protected]> wrote: > > > Hey N.N. Gesli, > > > > (Inline) > > > > On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli <[email protected]> wrote: > >> Hello, > >> > >> We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each > >> node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of > >> this huge RAM and run the map-reduce jobs mostly in memory with no spill, > >> if > >> possible. We use Hive for most of the processes. I have set: > >> mapred.tasktracker.map.tasks.maximum = 16 > >> mapred.tasktracker.reduce.tasks.maximum = 8 > > > > This is *crazy* for an 8 core machine. Try to keep M+R slots well > > below 8 instead - You're probably CPU-thrashed in this setup once > > large number of tasks get booted. > > > >> mapred.child.java.opts = 6144 > > > > You can also raise io.sort.mb to 2000, and tweak io.sort.factor. > > > > The child opts raise to 6~ GB looks a bit unnecessary since most of > > your tasks work on record basis and would not care much about total > > RAM. Perhaps use all that RAM for a service like HBase which can > > leverage caching nicely! > > > >> One of my Hive queries is producing 6 stage map-reduce jobs. On the third > >> stage when it queries from a 200GB table, the last 14 reducers hang. I > >> changed mapred.task.timeout to 0 to see if they really hang. It has been 5 > >> hours, so something terribly wrong in my setup. Parts of the log is below. > > > > It is probably just your slot settings. You may be massively > > over-subscribing your CPU resources with 16 map task slots + 8 reduce > > tasks slots. At worst case, it would mean 24 total JVMs competing over > > 8 available physical processors. Doesn't make sense to me at least - > > Make it more like 7 M / 2 R or so :) > > > >> My questions: > >> * What should be my configurations to make reducers to run in the memory? > >> * Why it keeps waiting for map outputs? > > > > It has to fetch map outputs to get some data to start with. And it > > pulls the map outputs a few at a time - to not overload the network > > during shuffle phases of several reducers across the cluster. > > > >> * What does it mean "dup hosts"? > > > > Duplicate hosts. Hosts it already knows about and has already > > scheduled fetch work upon. > > > > <snip> > > > > -- > > Harsh J > > > > <ngesli_reduce_log.txt>
