Kay Ousterhout created SPARK-2571: ------------------------------------- Summary: Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies Key: SPARK-2571 URL: https://issues.apache.org/jira/browse/SPARK-2571 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.1, 0.9.3 Reporter: Kay Ousterhout Assignee: Kay Ousterhout
In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include information about data fetched from one BlockFetcherIterator. When tasks have multiple shuffle dependencies (e.g., a stage that joins two datasets together), the metrics will get set based on data fetched from the last BlockFetcherIterator to complete, rather than the sum of all data fetched from all BlockFetcherIterators. This can lead to dramatically underreporting the shuffle read bytes. -- This message was sent by Atlassian JIRA (v6.2#6252)