[
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell reopened SPARK-2773:
------------------------------------
> Shuffle:use growth rate to predict if need to spill
> ---------------------------------------------------
>
> Key: SPARK-2773
> URL: https://issues.apache.org/jira/browse/SPARK-2773
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle
> Affects Versions: 0.9.0, 1.0.0
> Reporter: uncleGen
> Priority: Minor
>
> Right now, Spark uses the total usage of "shuffle" memory of each thread to
> predict if need to spill. I think it is not very reasonable. For example,
> there are two threads pulling "shuffle" data. The total memory used to buffer
> data is 21G. The first time to trigger spilling it when one thread has used
> 7G memory to buffer "shuffle" data, here I assume another one has used the
> same size. Unfortunately, I still have remaining 7G to use. So, I think
> current prediction mode is too conservative, and can not maximize the usage
> of "shuffle" memory. In my solution, I use the growth rate of "shuffle"
> memory. Again, the growth of each time is limited, maybe 10K * 1024(my
> assumption), then the first time to trigger spilling is when the remaining
> "shuffle" memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think
> it can maximize the usage of "shuffle" memory. In my solution, there is also
> a conservative assumption, i.e. all of threads is pulling shuffle data in one
> executor. However it dose not have much effect, the grow is limited after
> all. Any suggestion?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]