jin xing created SPARK-19659:
--------------------------------
Summary: Fetch big blocks to disk when shuffle-read
Key: SPARK-19659
URL: https://issues.apache.org/jira/browse/SPARK-19659
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 2.1.0
Reporter: jin xing
Currently the whole block is fetched into memory(offheap by default) when
shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can
be large when skew situations. If OOM happens during shuffle read, job will be
killed and users will be notified to "Consider boosting
spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more
memory can resolve the OOM. However the approach is not perfectly suitable for
production environment, especially for data warehouse.
Using Spark SQL as data engine in warehouse, users hope to have a unified
parameter(e.g. memory) but less resource wasted(resource is allocated but not
used),
It's not always easy to predict skew situations, when happen, it make sense to
fetch remote blocks to disk for shuffle-read, rather than
kill the job because of OOM. This approach is mentioned during the discussing
in SPARK-3019, by [~sandyr] and [~mridulm80]
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]