Dear Jinchun,

The warning message that you get is irrelevant. The problem seems to be in
the amount of memory that is given to the map-reduce tasks. You need to 
increase the heap size (e.g., run -Xmx 2048M) and make sure that you have 
enough DRAM for the heap size you indicate. To change the heap size, edit 
the following file
$HADOOP_HOME/conf/mapred-site.xml 
and specify the heap size by adding/changing the following parameter
mapred.child.java.opts

If your machine doesn't have enough DRAM, the whole process of preparing 
the data and the model is indeed expected to take a couple of hours. 

Regards,
Djordje
________________________________________
From: Jinchun Kim [[email protected]]
Sent: Friday, March 22, 2013 1:14 PM
To: [email protected]
Subject: Question about data analytic

Hi, All.

I'm trying to run Data analytic on my x86, Ubuntu machine.
I found that when I divided 30GB Wikipedia input data into small chunks of 64MB,
CPU usage was really low.
It was checked by /usr/bin/time command.
Most of execution time was idle and waiting.
User cpu time was only 13% of total running time.

Is it because I'm running Data analytic with single node?
Or does it have something to do with following warning message..?

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath,
will use command-line arguments only

I don't understand why user cpu time is so low while it takes 2.5 hours to 
finish
splitting Wikipedia inputs.
Thanks!

--
Jinchun Kim

Reply via email to