I've tested it with some part of the whole file. I've tried different values of -Xmx parameter in hadoop mapred config file. It worked with -Xmx= ~30000. I don't know why it worked. I've tested it on vm with 3GB of ram.
On Thu, Mar 13, 2014 at 8:19 PM, mahmood (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > mahmood updated MAHOUT-1456: > ---------------------------- > > Description: > 1- The XML file is > http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 > 2- When I run "mahout wikipediaXMLSplitter -d > enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at > chunk #571 and after 30 minutes it fails to continue with the java heap > size error. Previous chunks are created rapidly (10 chunks per second). > 3- Increasing the heap size via "-Xmx4096m" option doesn't work. > 4- No matter what is the configuration, it seems that there is a memory > leak that eat all space. > > was: > 1- The XML file is > http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 > 2- When I run "mahout wikipediaXMLSplitter -d enwiki-latest-pages-articles > -o wikipedia/chunks -c 64", it stuck at chunk #571 and after 30 minutes it > fails to continue with the java heap size error. Previous chunks are > created rapidly (10 chunks per second). > 3- Increasing the heap size via "-Xmx4096m" option doesn't work. > 4- No matter what is the configuration, it seems that there is a memory > leak that eat all space. > > > > The wikipediaXMLSplitter example fails with "heap size" error > > ------------------------------------------------------------- > > > > Key: MAHOUT-1456 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1456 > > Project: Mahout > > Issue Type: Bug > > Components: Examples > > Affects Versions: 0.9 > > Environment: Solaris 11.1 \ > > Hadoop 2.3.0 \ > > Maven 3.2.1 \ > > JDK 1.7.0_07-b10 \ > > Reporter: mahmood > > Labels: Heap,, mahout,, wikipediaXMLSplitter > > > > 1- The XML file is > http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 > > 2- When I run "mahout wikipediaXMLSplitter -d > enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at > chunk #571 and after 30 minutes it fails to continue with the java heap > size error. Previous chunks are created rapidly (10 chunks per second). > > 3- Increasing the heap size via "-Xmx4096m" option doesn't work. > > 4- No matter what is the configuration, it seems that there is a memory > leak that eat all space. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >
