[ 
https://issues.apache.org/jira/browse/SPARK-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335646#comment-14335646
 ] 

Ilya Ganelin edited comment on SPARK-5845 at 2/24/15 11:19 PM:
---------------------------------------------------------------

If I understand correctly, the file cleanup happens in 
IndexShuffleBlockManager:::removeDataByMap(), which is called from either the 
SortShuffleManager or the SortShuffleWriter. The problem is that these classes 
do not have any knowledge of the currently collected metrics. Furthermore, the 
disk cleanup is, unless configured in the SparkConf, triggered asynchronously 
via the RemoveShuffle message so there doesn't appear to be a straightforward 
way to provide a set of metrics to be updated. 

Do you have any suggestions for getting around this? Please let me know, thank 
you. 


was (Author: ilganeli):
If I understand correctly, the file cleanup happens in 
IndexShuffleBlockManager:::removeDataByMap(), which is called from either the 
SortShuffleManager or the SortShuffleWriter. The problem is that these classes 
do not have any knowledge of the currently collected metrics. Furthermore, the 
disk cleanup is, unless configured in the SparkConf, triggered asynchronously 
via the RemoveShuffle() message so there doesn't appear to be a straightforward 
way to provide a set of metrics to be updated. 

Do you have any suggestions for getting around this? Please let me know, thank 
you. 

> Time to cleanup intermediate shuffle files not included in shuffle write time
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-5845
>                 URL: https://issues.apache.org/jira/browse/SPARK-5845
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.3.0, 1.2.1
>            Reporter: Kay Ousterhout
>            Assignee: Ilya Ganelin
>            Priority: Minor
>
> When the disk is contended, I've observed cases when it takes as long as 7 
> seconds to clean up all of the intermediate spill files for a shuffle (when 
> using the sort based shuffle, but bypassing merging because there are <=200 
> shuffle partitions).  This is even when the shuffle data is non-huge (152MB 
> written from one of the tasks where I observed this).  This is effectively 
> part of the shuffle write time (because it's a necessary side effect of 
> writing data to disk) so should be added to the shuffle write time to 
> facilitate debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to