[
https://issues.apache.org/jira/browse/SPARK-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kay Ousterhout resolved SPARK-2571.
-----------------------------------
Resolution: Fixed
> Shuffle read bytes are reported incorrectly for stages with multiple shuffle
> dependencies
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-2571
> URL: https://issues.apache.org/jira/browse/SPARK-2571
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 1.0.1, 0.9.3
> Reporter: Kay Ousterhout
> Assignee: Kay Ousterhout
>
> In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include
> information about data fetched from one BlockFetcherIterator. When tasks
> have multiple shuffle dependencies (e.g., a stage that joins two datasets
> together), the metrics will get set based on data fetched from the last
> BlockFetcherIterator to complete, rather than the sum of all data fetched
> from all BlockFetcherIterators. This can lead to dramatically underreporting
> the shuffle read bytes.
> Thanks [~andrewor14] and [~rxin] for helping to diagnose this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)