[
https://issues.apache.org/jira/browse/SPARK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109591#comment-14109591
]
Apache Spark commented on SPARK-3211:
-------------------------------------
User 'ash211' has created a pull request for this issue:
https://github.com/apache/spark/pull/2117
> .take() is OOM-prone when there are empty partitions
> ----------------------------------------------------
>
> Key: SPARK-3211
> URL: https://issues.apache.org/jira/browse/SPARK-3211
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.2
> Reporter: Andrew Ash
>
> Filed on dev@ on 22 August by [~pnepywoda]:
> {quote}
> On line 777
> https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771
> the logic for take() reads ALL partitions if the first one (or first k) are
> empty. This has actually lead to OOMs when we had many partitions
> (thousands) and unfortunately the first one was empty.
> Wouldn't a better implementation strategy be
> numPartsToTry = partsScanned * 2
> instead of
> numPartsToTry = totalParts - 1
> (this doubling is similar to most memory allocation strategies)
> Thanks!
> - Paul
> {quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]