[
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617939#comment-14617939
]
Amit Gupta commented on SPARK-7337:
-----------------------------------
I look at your comment as workaround, when you say make minSupport to 1.0 then
you are asking me to get item sequences appears in all the transactions and
then work backwards to get upto the breaking point.
Well I am not looking for workaround as I already have custom code written in
core spark api which is working fine, its only FP api which breaks which I
thought to try out.
When Tree grows beyond RAM then it should spill over to disk rather then it
should throw outofmemory.
Try to use data in below site for recommendation/seq. of items, on single
machine: https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data.
> FPGrowth algo throwing OutOfMemoryError
> ---------------------------------------
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.3.1
> Environment: Ubuntu
> Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]