[jira] [Commented] (SPARK-20340) Size estimate very wrong in ExternalAppendOnlyMap from CoGroupedRDD, cause OOM

Shixiong Zhu (JIRA) Fri, 14 Apr 2017 16:43:50 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969680#comment-15969680
 ]


Shixiong Zhu commented on SPARK-20340:
--------------------------------------

I think it's just a trade off between accuracy and performance.

> Size estimate very wrong in ExternalAppendOnlyMap from CoGroupedRDD, cause OOM
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-20340
>                 URL: https://issues.apache.org/jira/browse/SPARK-20340
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> I had a user doing a basic join operation. The values are image's binary 
> data(in base64 format) and widely vary in size.
> The job failed with out of memory. Originally failed on yarn with using to 
> much overhead memory, turned spark.shuffle.io.preferDirectBufs      to false 
> then failed with out of heap memory.  I debugged it down to during the 
> shuffle when CoGroupedRDD putting things into the ExternalAppendOnlyMap, it 
> computes an estimated size to determine when to spill.  In this case 
> SizeEstimator handle arrays such that if it is larger then 400 elements, it 
> samples 100 elements. The estimate is coming back as GB's different from the 
> actual size.  It claims 1GB when it is actually using close to 5GB. 
> Temporary work around it to increase the memory to be very large (10GB 
> executors) but that isn't really acceptable here.  User did the same thing in 
> pig and it easily handled the data with 1.5GB of memory.
> It seems risky to be using an estimate in such a critical thing. If the 
> estimate is wrong you are going to run out of memory and fail the job.
> I'm looking closer at the users data still to get more insights.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20340) Size estimate very wrong in ExternalAppendOnlyMap from CoGroupedRDD, cause OOM

Reply via email to