Kay Ousterhout created SPARK-2571:
-------------------------------------

             Summary: Shuffle read bytes are reported incorrectly for stages 
with multiple shuffle dependencies
                 Key: SPARK-2571
                 URL: https://issues.apache.org/jira/browse/SPARK-2571
             Project: Spark
          Issue Type: Bug
          Components: Web UI
    Affects Versions: 1.0.1, 0.9.3
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout


In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include 
information about data fetched from one BlockFetcherIterator.  When tasks have 
multiple shuffle dependencies (e.g., a stage that joins two datasets together), 
the metrics will get set based on data fetched from the last 
BlockFetcherIterator to complete, rather than the sum of all data fetched from 
all BlockFetcherIterators.  This can lead to dramatically underreporting the 
shuffle read bytes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to