Kay Ousterhout created SPARK-5845:
-------------------------------------

             Summary: Time to cleanup intermediate shuffle files not included 
in shuffle write time
                 Key: SPARK-5845
                 URL: https://issues.apache.org/jira/browse/SPARK-5845
             Project: Spark
          Issue Type: Bug
          Components: Shuffle
    Affects Versions: 1.2.1, 1.3.0
            Reporter: Kay Ousterhout
            Priority: Minor


When the disk is contended, I've observed cases when it takes as long as 7 
seconds to clean up all of the intermediate spill files for a shuffle (when 
using the sort based shuffle, but bypassing merging because there are <=200 
shuffle partitions).  This is even when the shuffle data is non-huge (152MB 
written from one of the tasks where I observed this).  This is effectively part 
of the shuffle write time (because it's a necessary side effect of writing data 
to disk) so should be added to the shuffle write time to facilitate debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to