[
https://issues.apache.org/jira/browse/SPARK-20340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969680#comment-15969680
]
Shixiong Zhu commented on SPARK-20340:
--------------------------------------
I think it's just a trade off between accuracy and performance.
> Size estimate very wrong in ExternalAppendOnlyMap from CoGroupedRDD, cause OOM
> ------------------------------------------------------------------------------
>
> Key: SPARK-20340
> URL: https://issues.apache.org/jira/browse/SPARK-20340
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: Thomas Graves
>
> I had a user doing a basic join operation. The values are image's binary
> data(in base64 format) and widely vary in size.
> The job failed with out of memory. Originally failed on yarn with using to
> much overhead memory, turned spark.shuffle.io.preferDirectBufs to false
> then failed with out of heap memory. I debugged it down to during the
> shuffle when CoGroupedRDD putting things into the ExternalAppendOnlyMap, it
> computes an estimated size to determine when to spill. In this case
> SizeEstimator handle arrays such that if it is larger then 400 elements, it
> samples 100 elements. The estimate is coming back as GB's different from the
> actual size. It claims 1GB when it is actually using close to 5GB.
> Temporary work around it to increase the memory to be very large (10GB
> executors) but that isn't really acceptable here. User did the same thing in
> pig and it easily handled the data with 1.5GB of memory.
> It seems risky to be using an estimate in such a critical thing. If the
> estimate is wrong you are going to run out of memory and fail the job.
> I'm looking closer at the users data still to get more insights.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]