Kay Ousterhout created SPARK-3570:
-------------------------------------

             Summary: Shuffle write time does not include time to open shuffle 
files
                 Key: SPARK-3570
                 URL: https://issues.apache.org/jira/browse/SPARK-3570
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.1.0, 1.0.2, 0.9.2
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout


Currently, the reported shuffle write time does not include time to open the 
shuffle files.  This time can be very significant when the disk is highly 
utilized and many shuffle files exist on the machine (I'm not sure how severe 
this is in 1.0 onward -- since shuffle files are automatically deleted, this 
may be less of an issue because there are fewer old files sitting around).  In 
experiments I did, in extreme cases, adding the time to open files can increase 
the shuffle write time from 5ms (of a 2 second task) to 1 second.  We should 
fix this for better performance debugging.

Thanks [~shivaram] for helping to diagnose this problem.  cc [~pwendell]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to