Dear Jinchun, A timeout of 1200sec is already too generous. Increasing it will not solve the problem. I cannot see your logs, but yes, the problem again seems to be the indicated heap size and the DRAM capacity your machine has.
Regards, Djordje ________________________________________ From: Jinchun Kim [[email protected]] Sent: Friday, March 22, 2013 3:04 PM To: Djordje Jevdjic Cc: [email protected] Subject: Re: Question about data analytic Thanks Djordje :) I was able to prepare the input data file and now I'm trying to create category-based splits of Wikipedia dataset(41GB) and the training data set(5GB) using Mahout. I had no problem with the training data set, but Hadoop showed following messages when I tried to do a same job with Wikipedia dataset, ......... 13/03/21 22:31:00 INFO mapred.JobClient: map 27% reduce 1% 13/03/21 22:40:31 INFO mapred.JobClient: map 27% reduce 2% 13/03/21 22:58:49 INFO mapred.JobClient: map 27% reduce 3% 13/03/21 23:22:57 INFO mapred.JobClient: map 27% reduce 4% 13/03/21 23:46:32 INFO mapred.JobClient: map 27% reduce 5% 13/03/22 00:27:14 INFO mapred.JobClient: map 27% reduce 6% 13/03/22 01:06:55 INFO mapred.JobClient: map 27% reduce 7% 13/03/22 01:14:06 INFO mapred.JobClient: map 27% reduce 3% 13/03/22 01:15:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0002_r_000000_1, Status : FAILED Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200 seconds. Killing! 13/03/22 01:20:09 INFO mapred.JobClient: map 27% reduce 4% 13/03/22 01:33:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0002_m_000037_1, Status : FAILED Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228 seconds. Killing! 13/03/22 01:35:12 INFO mapred.JobClient: map 27% reduce 5% 13/03/22 01:40:38 INFO mapred.JobClient: map 27% reduce 6% 13/03/22 01:52:28 INFO mapred.JobClient: map 27% reduce 7% 13/03/22 02:16:27 INFO mapred.JobClient: map 27% reduce 8% 13/03/22 02:19:02 INFO mapred.JobClient: Task Id : attempt_201303211339_0002_m_000018_1, Status : FAILED Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204 seconds. Killing! 13/03/22 02:49:03 INFO mapred.JobClient: map 27% reduce 9% 13/03/22 02:52:04 INFO mapred.JobClient: map 28% reduce 9% ........ Reduce falls back to the previous point and the process gets end at map 46%, reduce 2% without being completed. Is this also relevant to the heap and DRAM size? I was wondering if increasing outage time will help or not.. On Fri, Mar 22, 2013 at 8:46 AM, Djordje Jevdjic <[email protected]<mailto:[email protected]>> wrote: Dear Jinchun, The warning message that you get is irrelevant. The problem seems to be in the amount of memory that is given to the map-reduce tasks. You need to increase the heap size (e.g., run -Xmx 2048M) and make sure that you have enough DRAM for the heap size you indicate. To change the heap size, edit the following file $HADOOP_HOME/conf/mapred-site.xml and specify the heap size by adding/changing the following parameter mapred.child.java.opts If your machine doesn't have enough DRAM, the whole process of preparing the data and the model is indeed expected to take a couple of hours. Regards, Djordje ________________________________________ From: Jinchun Kim [[email protected]<mailto:[email protected]>] Sent: Friday, March 22, 2013 1:14 PM To: [email protected]<mailto:[email protected]> Subject: Question about data analytic Hi, All. I'm trying to run Data analytic on my x86, Ubuntu machine. I found that when I divided 30GB Wikipedia input data into small chunks of 64MB, CPU usage was really low. It was checked by /usr/bin/time command. Most of execution time was idle and waiting. User cpu time was only 13% of total running time. Is it because I'm running Data analytic with single node? Or does it have something to do with following warning message..? WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only I don't understand why user cpu time is so low while it takes 2.5 hours to finish splitting Wikipedia inputs. Thanks! -- Jinchun Kim -- Jinchun Kim
