Dear Jinchun, The warning message that you get is irrelevant. The problem seems to be in the amount of memory that is given to the map-reduce tasks. You need to increase the heap size (e.g., run -Xmx 2048M) and make sure that you have enough DRAM for the heap size you indicate. To change the heap size, edit the following file $HADOOP_HOME/conf/mapred-site.xml and specify the heap size by adding/changing the following parameter mapred.child.java.opts
If your machine doesn't have enough DRAM, the whole process of preparing the data and the model is indeed expected to take a couple of hours. Regards, Djordje ________________________________________ From: Jinchun Kim [[email protected]] Sent: Friday, March 22, 2013 1:14 PM To: [email protected] Subject: Question about data analytic Hi, All. I'm trying to run Data analytic on my x86, Ubuntu machine. I found that when I divided 30GB Wikipedia input data into small chunks of 64MB, CPU usage was really low. It was checked by /usr/bin/time command. Most of execution time was idle and waiting. User cpu time was only 13% of total running time. Is it because I'm running Data analytic with single node? Or does it have something to do with following warning message..? WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only I don't understand why user cpu time is so low while it takes 2.5 hours to finish splitting Wikipedia inputs. Thanks! -- Jinchun Kim
