[
https://issues.apache.org/jira/browse/SPARK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-31202:
------------------------------------
Assignee: (was: Apache Spark)
> Improve SizeEstimator for AppendOnlyMap
> ---------------------------------------
>
> Key: SPARK-31202
> URL: https://issues.apache.org/jira/browse/SPARK-31202
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.3.2, 3.0.0
> Reporter: liupengcheng
> Priority: Major
>
> Currently, spark's memory management depends on the size estimation for
> execution and storage.
> In our real cluster, users always meet the issue OOM due to the inaccurate
> size estimation for ` AppendOnlyMap`, that's because spark stores KV in an
> Array[AnyRef] in `AppendOnlyMap` for memory locality, and this value can be
> CompactBuffer[_] or Array[CompactBuffer[_]] for transformation like
> cogroup/join/groupBy, but current `SizeEstimator` will still treat this
> special array as an normal array, so in many cases, we noticed a great bias
> between the estimated size and the acutal memory consuption.
> So we improved this in xiaomi:
> 1. Improve the estimation for AppendOnlyMap when the value type is
> CompactBuffer
> 2. Respect jvm gc stats to decide whether to spilling when doing sort/agg
> In this jira, I propose to solve the first part which is improving the
> estimation for `AppendOnlyMap` when the value type is CompactBuffer
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]