[
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-2532:
-----------------------------------
Target Version/s: 1.2.0 (was: 1.1.0)
> Fix issues with consolidated shuffle
> ------------------------------------
>
> Key: SPARK-2532
> URL: https://issues.apache.org/jira/browse/SPARK-2532
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.1.0
> Environment: All
> Reporter: Mridul Muralidharan
> Assignee: Mridul Muralidharan
> Priority: Critical
>
> Will file PR with changes as soon as merge is done (earlier merge became
> outdated in 2 weeks unfortunately :) ).
> Consolidated shuffle is broken in multiple ways in spark :
> a) Task failure(s) can cause the state to become inconsistent.
> b) Multiple revert's or combination of close/revert/close can cause the state
> to be inconsistent.
> (As part of exception/error handling).
> c) Some of the api in block writer causes implementation issues - for
> example: a revert is always followed by close : but the implemention tries to
> keep them separate, resulting in surface for errors.
> d) Fetching data from consolidated shuffle files can go badly wrong if the
> file is being actively written to : it computes length by subtracting next
> offset from current offset (or length if this is last offset)- the latter
> fails when fetch is happening in parallel to write.
> Note, this happens even if there are no task failures of any kind !
> This usually results in stream corruption or decompression errors.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]