[
https://issues.apache.org/jira/browse/SPARK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124794#comment-14124794
]
Andrew Ash commented on SPARK-3211:
-----------------------------------
This was merged into branch-1.1 and develop
> .take() is OOM-prone when there are empty partitions
> ----------------------------------------------------
>
> Key: SPARK-3211
> URL: https://issues.apache.org/jira/browse/SPARK-3211
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.2
> Reporter: Andrew Ash
> Assignee: Andrew Ash
> Fix For: 1.1.1, 1.2.0
>
>
> Filed on dev@ on 22 August by [~pnepywoda]:
> {quote}
> On line 777
> https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771
> the logic for take() reads ALL partitions if the first one (or first k) are
> empty. This has actually lead to OOMs when we had many partitions
> (thousands) and unfortunately the first one was empty.
> Wouldn't a better implementation strategy be
> numPartsToTry = partsScanned * 2
> instead of
> numPartsToTry = totalParts - 1
> (this doubling is similar to most memory allocation strategies)
> Thanks!
> - Paul
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]