[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527874#comment-14527874
 ] 

Amit Gupta commented on SPARK-7337:
-----------------------------------

Yes, I know that if anything collected on driver which doesn't fit memory will 
fail. Here I am talking about below line:

FPGrowthModel<String> model = new FPGrowth()
                .setMinSupport(minSupport)
                .setNumPartitions(numPartition)
                .run(transactions);

Where numPartition is >50 (i.e. 500). Please refer to print screen, one which 
is active stage (belongs to above API call) is throwing OutOfMemoryError. Now 
again referring to print screen, Stage just before active one (whose status is 
completed) has 500 tasks and Stage which is active has 17 tasks, instead it 
should have 500 tasks as I set numPartition as 500. And if you again refer to 
print screen, next Stage pending for execution again has 500 tasks. If you fix 
code around active Stage with 17 tasks to have tasks equal to numPartition then 
OutOfMemoryError issues will be fixed.

> FPGrowth algo throwing OutOfMemoryError
> ---------------------------------------
>
>                 Key: SPARK-7337
>                 URL: https://issues.apache.org/jira/browse/SPARK-7337
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.3.1
>         Environment: Ubuntu
>            Reporter: Amit Gupta
>         Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to