[ 
https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938995#comment-13938995
 ] 

mahmood commented on MAHOUT-1456:
---------------------------------

Here is the output of hadoop 1.2.1. Please note that in both configurations, 
the system has 4GB of RAM


[hadoop@solaris hadoop-1.2.1]$ ./bin/start-all.sh 
starting namenode, logging to 
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-solaris.out
localhost: starting datanode, logging to 
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-solaris.out
localhost: starting secondarynamenode, logging to 
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-solaris.out
starting jobtracker, logging to 
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-solaris.out
localhost: starting tasktracker, logging to 
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-solaris.out


[hadoop@solaris hadoop-1.2.1]$ which hadoop
/export/home/hadoop/hadoop-1.2.1/bin/hadoop


[hadoop@solaris hadoop-1.2.1]$ cd ../mahout-distribution-0.9/
[hadoop@solaris mahout-distribution-0.9]$ ./bin/mahout wikipediaXMLSplitter -d 
../enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-1.2.1/bin/hadoop and 
HADOOP_CONF_DIR=
MAHOUT-JOB: 
/export/home/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
14/03/18 11:59:50 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found 
on classpath, will use command-line arguments only
14/03/18 12:41:34 INFO driver.MahoutDriver: Program took 2503691 ms (Minutes: 
41.728183333333334)


[hadoop@solaris mahout-distribution-0.9]$ cd ../hadoop-1.2.1/conf/
[hadoop@solaris conf]$ cat core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose scheme and 
authority determine the FileSystem implementation. The uri's scheme determines 
the config property (fs.SCHEME.impl) naming the FileSystem implementation 
class. The uri's authority is used to determine the host, port, etc. for a 
filesystem.</description>
</property>
</configuration>


[hadoop@solaris conf]$ cat hdfs-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication. The actual number of replications can 
be specified when the file is created. The default is used if replication is 
not specified in create time. </description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/home/hadoop/hadoop-1.2.1/tmp </value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/export/home/hadoop/hadoop-1.2.1/dfs.name </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/export/home/hadoop/hadoop-1.2.1/dfs.data </value>
</property>
</configuration>


[hadoop@solaris conf]$ cat mapred-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If 
"local", then jobs are run in-process as a single map and reduce task. 
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xm1024M</value>
</property>
</configuration>
[hadoop@solaris conf]$ 


> The wikipediaXMLSplitter example fails with "heap size" error
> -------------------------------------------------------------
>
>                 Key: MAHOUT-1456
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1456
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.9
>         Environment: Solaris 11.1 \
> Hadoop 2.3.0 \
> Maven 3.2.1 \
> JDK 1.7.0_07-b10 \
>            Reporter: mahmood
>              Labels: Heap,, mahout,, wikipediaXMLSplitter
>
> 1- The XML file is 
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> 2- When I run "mahout wikipediaXMLSplitter -d 
> enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at 
> chunk #571 and after 30 minutes it fails to continue with the java heap size 
> error. Previous chunks are created rapidly (10 chunks per second).
> 3- Increasing the heap size via "-Xmx4096m" option doesn't work.
> 4- No matter what is the configuration, it seems that there is a memory leak 
> that eat all space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to