[
https://issues.apache.org/jira/browse/MAHOUT-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pat Ferrel resolved MAHOUT-1707.
--------------------------------
Resolution: Fixed
removed bad collect.
> Spark-itemsimilarity uses too much memory
> -----------------------------------------
>
> Key: MAHOUT-1707
> URL: https://issues.apache.org/jira/browse/MAHOUT-1707
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering, cooccurrence
> Affects Versions: 0.10.0
> Environment: Spark
> Reporter: Pat Ferrel
> Assignee: Pat Ferrel
> Fix For: 0.10.1
>
>
> java.lang.OutOfMemoryError: Java heap space
> The code has an unnecessary .collect(), forcing all interaction data into
> memory of the client/driver. Increasing the executor memory will not help
> with this.
> remove this line and rebuild Mahout.
> https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157
> The errant line reads:
> interactions.collect()
> This forces the user action data into memory, a bad thing for memory
> consumption. Removing it should allow for better Spark memory management.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)