[
https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mahmood updated MAHOUT-1456:
----------------------------
Comment: was deleted
(was: In that pastbin link, I see that only the last command produces the heap
error size and that command is totally different from mine (or the manout
example in mahout docs).
Can you please test the wikipedia example with hadoop 1.2.1 and 2.1.0-beta to
see the difference? )
> The wikipediaXMLSplitter example fails with "heap size" error
> -------------------------------------------------------------
>
> Key: MAHOUT-1456
> URL: https://issues.apache.org/jira/browse/MAHOUT-1456
> Project: Mahout
> Issue Type: Bug
> Components: Examples
> Affects Versions: 0.9
> Environment: Solaris 11.1 \
> Hadoop 2.3.0 \
> Maven 3.2.1 \
> JDK 1.7.0_07-b10 \
> Reporter: mahmood
> Labels: Heap,, mahout,, wikipediaXMLSplitter
>
> 1- The XML file is
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> 2- When I run "mahout wikipediaXMLSplitter -d
> enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at
> chunk #571 and after 30 minutes it fails to continue with the java heap size
> error. Previous chunks are created rapidly (10 chunks per second).
> 3- Increasing the heap size via "-Xmx4096m" option doesn't work.
> 4- No matter what is the configuration, it seems that there is a memory leak
> that eat all space.
--
This message was sent by Atlassian JIRA
(v6.2#6252)