[
https://issues.apache.org/jira/browse/SPARK-33331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399845#comment-17399845
]
Min Shen commented on SPARK-33331:
----------------------------------
With the change in SPARK-36423, I think this issue is further alleviated.
We can still give this idea a try though.
> Limit the number of pending blocks in memory and store blocks that collide
> --------------------------------------------------------------------------
>
> Key: SPARK-33331
> URL: https://issues.apache.org/jira/browse/SPARK-33331
> Project: Spark
> Issue Type: Sub-task
> Components: Shuffle
> Affects Versions: 3.1.0
> Reporter: Chandni Singh
> Priority: Major
>
> This jira addresses the below two points:
> 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately
> are stored in memory. The stream callback maintains a list of
> {{deferredBufs}}. When a block cannot be merged it is added to this list.
> Currently, there isn't a limit on the number of pending blocks. We can limit
> the number of pending blocks in memory. There has been a discussion around
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another
> approach is to store the data of the stream in {{AppShufflePartitionInfo}}
> when it reaches the worst-case scenario. This may increase the memory usage
> of the shuffle service though. However, given a limit introduced with 1 we
> can try this out.
> More information can be found in this discussion:
> [https://github.com/apache/spark/pull/30062#discussion_r517524546]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]