[ 
https://issues.apache.org/jira/browse/SPARK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124794#comment-14124794
 ] 

Andrew Ash commented on SPARK-3211:
-----------------------------------

This was merged into branch-1.1 and develop

> .take() is OOM-prone when there are empty partitions
> ----------------------------------------------------
>
>                 Key: SPARK-3211
>                 URL: https://issues.apache.org/jira/browse/SPARK-3211
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>            Reporter: Andrew Ash
>            Assignee: Andrew Ash
>             Fix For: 1.1.1, 1.2.0
>
>
> Filed on dev@ on 22 August by [~pnepywoda]:
> {quote}
> On line 777
> https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771
> the logic for take() reads ALL partitions if the first one (or first k) are
> empty. This has actually lead to OOMs when we had many partitions
> (thousands) and unfortunately the first one was empty.
> Wouldn't a better implementation strategy be
> numPartsToTry = partsScanned * 2
> instead of
> numPartsToTry = totalParts - 1
> (this doubling is similar to most memory allocation strategies)
> Thanks!
> - Paul
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to