[
https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891999#comment-15891999
]
jin xing edited comment on SPARK-19659 at 3/19/17 3:16 PM:
-----------------------------------------------------------
[~rxin] [~davies] [~andrewor14] [~joshrosen] [~kayousterhout]
I've uploaded a design doc, It's great if you could help comment on it when
have time :) Actually, my data warehouse suffers a lot on this issue. Please
take a look if possible. Sorry if this is bothering.
was (Author: [email protected]):
[~rxin] [~davies] [~andrewor14] [~joshrosen]
I've uploaded a design doc, It's great if you could help comment on it when
have time :) Actually, my data warehouse suffers a lot on this issue. Please
take a look if possible. Sorry if this is bothering.
> Fetch big blocks to disk when shuffle-read
> ------------------------------------------
>
> Key: SPARK-19659
> URL: https://issues.apache.org/jira/browse/SPARK-19659
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle
> Affects Versions: 2.1.0
> Reporter: jin xing
> Attachments: SPARK-19659-design-v1.pdf, SPARK-19659-design-v2.pdf
>
>
> Currently the whole block is fetched into memory(offheap by default) when
> shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can
> be large when skew situations. If OOM happens during shuffle read, job will
> be killed and users will be notified to "Consider boosting
> spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more
> memory can resolve the OOM. However the approach is not perfectly suitable
> for production environment, especially for data warehouse.
> Using Spark SQL as data engine in warehouse, users hope to have a unified
> parameter(e.g. memory) but less resource wasted(resource is allocated but not
> used),
> It's not always easy to predict skew situations, when happen, it make sense
> to fetch remote blocks to disk for shuffle-read, rather than
> kill the job because of OOM. This approach is mentioned during the discussion
> in SPARK-3019, by [~sandyr] and [~mridulm80]
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]