[ 
https://issues.apache.org/jira/browse/SPARK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31202:
------------------------------------

    Assignee:     (was: Apache Spark)

> Improve SizeEstimator for AppendOnlyMap
> ---------------------------------------
>
>                 Key: SPARK-31202
>                 URL: https://issues.apache.org/jira/browse/SPARK-31202
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.2, 3.0.0
>            Reporter: liupengcheng
>            Priority: Major
>
> Currently, spark's memory management depends on the size estimation for 
> execution and storage. 
> In our real cluster, users always meet the issue OOM due to the inaccurate 
> size estimation for ` AppendOnlyMap`, that's because spark stores KV in an 
> Array[AnyRef] in `AppendOnlyMap` for memory locality,  and this value can be 
> CompactBuffer[_] or Array[CompactBuffer[_]] for transformation like 
> cogroup/join/groupBy, but current `SizeEstimator` will still treat this 
> special array as an normal array, so in many cases, we noticed a great bias 
> between the estimated size and the acutal memory consuption. 
> So we improved this in xiaomi:
> 1. Improve the estimation for AppendOnlyMap when the value type is 
> CompactBuffer
> 2. Respect jvm gc stats to decide whether to spilling when doing sort/agg
> In this jira, I propose to solve the first part which is improving the 
> estimation for `AppendOnlyMap` when the value type is CompactBuffer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to