Thomas Graves created SPARK-20340:
-------------------------------------
Summary: Size estimate very wrong in ExternalAppendOnlyMap from
CoGroupedRDD, cause OOM
Key: SPARK-20340
URL: https://issues.apache.org/jira/browse/SPARK-20340
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.1.0
Reporter: Thomas Graves
I had a user doing a basic join operation. The values are image's binary
data(in base64 format) and widely vary in size.
The job failed with out of memory. Originally failed on yarn with using to much
overhead memory, turned spark.shuffle.io.preferDirectBufs to false then
failed with out of heap memory. I debugged it down to during the shuffle when
CoGroupedRDD putting things into the ExternalAppendOnlyMap, it computes an
estimated size to determine when to spill. In this case SizeEstimator handle
arrays such that if it is larger then 400 elements, it samples 100 elements.
The estimate is coming back as GB's different from the actual size. It claims
1GB when it is actually using close to 5GB.
Temporary work around it to increase the memory to be very large (10GB
executors) but that isn't really acceptable here. User did the same thing in
pig and it easily handled the data with 1.5GB of memory.
It seems risky to be using an estimate in such a critical thing. If the
estimate is wrong you are going to run out of memory and fail the job.
I'm looking closer at the users data still to get more insights.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]