Did u try playing with mapred.child.ulimit along with java.opts ? Sent from my iPhone
On Jun 18, 2011, at 9:55 AM, Ken Williams <[email protected]> wrote: > > Hi All, > > I'm having a problem running a job on Hadoop. Using Mahout, I've been able to > run several Bayesian classifiers and train and test them successfully on > increasingly large datasets. Now I'm working on a dataset of 100,000 > documents (size 100MB). I've trained the classifier on 80,000 docs and am > using the remaining 20,000 as the test set. I've been able to train the > classifier but when I try to 'testclassifier' all the map tasks are failing > with a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' > exception, before the job itself is 'Killed'. I have a small cluster of 3 > machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core > machines). > I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G > -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE > at values like 2000, 2500 and 3000 but this made no difference. When the > program is running I can use 'top' to see that although the CPUs are busy, > memory usage rarely goes above 12GB and absolutely no swapping is taking > place. (see Program console output: http://pastebin.com/0m2Uduxa, Job config > file: http://pastebin.com/4GEFSnUM). > I found a similar problem with a 'GC overhead limit exceeded' where the > program was spending so much time garbage-collecting (more then 90% of its > time!) that it was unable to progress and so threw the 'GC overhead limit > exceeded' exception. If I set (-XX:-UseGCOverheadLimit) in the > 'mapred.child.java.opts' property to avoid this exception then I see the same > behaviour as before only a slightly different exception is thrown, Caused > by: java.lang.OutOfMemoryError: Java heap space at > java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39) > So I'm guessing that maybe my program is spending too much time > garbage-collecting for it to progress ? But how do I fix this ? There's no > further info in the log-files other than seeing the exceptions being thrown. > I tried to reduce the 'dfs.block.size' parameter to reduce the amount of data > going into each 'map' process (and therefore reduce it's memory requirements) > but this made no difference. I tried various settings for JVM reuse > (mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited > re-use (10), and unlimited re-use (-1) but no difference. I think the problem > is in the job configuration parameters but how do I find it ? I'm using > Hadoop 0.20.2 and the latest Mahout snapshot version. All machines are > running 64-bit Ubuntu and Java 6.Any help would be very much appreciated, > > Ken Williams > > > > > > > > > >
