[
https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938995#comment-13938995
]
mahmood commented on MAHOUT-1456:
---------------------------------
Here is the output of hadoop 1.2.1. Please note that in both configurations,
the system has 4GB of RAM
[hadoop@solaris hadoop-1.2.1]$ ./bin/start-all.sh
starting namenode, logging to
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-solaris.out
localhost: starting datanode, logging to
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-solaris.out
localhost: starting secondarynamenode, logging to
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-solaris.out
starting jobtracker, logging to
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-solaris.out
localhost: starting tasktracker, logging to
/export/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-solaris.out
[hadoop@solaris hadoop-1.2.1]$ which hadoop
/export/home/hadoop/hadoop-1.2.1/bin/hadoop
[hadoop@solaris hadoop-1.2.1]$ cd ../mahout-distribution-0.9/
[hadoop@solaris mahout-distribution-0.9]$ ./bin/mahout wikipediaXMLSplitter -d
../enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/export/home/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
14/03/18 11:59:50 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found
on classpath, will use command-line arguments only
14/03/18 12:41:34 INFO driver.MahoutDriver: Program took 2503691 ms (Minutes:
41.728183333333334)
[hadoop@solaris mahout-distribution-0.9]$ cd ../hadoop-1.2.1/conf/
[hadoop@solaris conf]$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose scheme and
authority determine the FileSystem implementation. The uri's scheme determines
the config property (fs.SCHEME.impl) naming the FileSystem implementation
class. The uri's authority is used to determine the host, port, etc. for a
filesystem.</description>
</property>
</configuration>
[hadoop@solaris conf]$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication. The actual number of replications can
be specified when the file is created. The default is used if replication is
not specified in create time. </description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/home/hadoop/hadoop-1.2.1/tmp </value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/export/home/hadoop/hadoop-1.2.1/dfs.name </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/export/home/hadoop/hadoop-1.2.1/dfs.data </value>
</property>
</configuration>
[hadoop@solaris conf]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and reduce task.
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xm1024M</value>
</property>
</configuration>
[hadoop@solaris conf]$
> The wikipediaXMLSplitter example fails with "heap size" error
> -------------------------------------------------------------
>
> Key: MAHOUT-1456
> URL: https://issues.apache.org/jira/browse/MAHOUT-1456
> Project: Mahout
> Issue Type: Bug
> Components: Examples
> Affects Versions: 0.9
> Environment: Solaris 11.1 \
> Hadoop 2.3.0 \
> Maven 3.2.1 \
> JDK 1.7.0_07-b10 \
> Reporter: mahmood
> Labels: Heap,, mahout,, wikipediaXMLSplitter
>
> 1- The XML file is
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> 2- When I run "mahout wikipediaXMLSplitter -d
> enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at
> chunk #571 and after 30 minutes it fails to continue with the java heap size
> error. Previous chunks are created rapidly (10 chunks per second).
> 3- Increasing the heap size via "-Xmx4096m" option doesn't work.
> 4- No matter what is the configuration, it seems that there is a memory leak
> that eat all space.
--
This message was sent by Atlassian JIRA
(v6.2#6252)