[
https://issues.apache.org/jira/browse/SPARK-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335751#comment-14335751
]
Ilya Ganelin commented on SPARK-5845:
-------------------------------------
My mistake - missed your comment about the spill files in the detailed
description. Given that we're interested in cleaning up the spill files which
appear to be cleaned up in ExternalSorter.stop() (please correct me if I'm
wrong), I would like to either
a) Pass the context to the stop() method - this is possible since the
SortShuffleWriter has visibility of the TaskContext (which in turn stores the
metrics we're interested in).
b) (My preference since it won't break the existing interface) Surround
sorter.stop() on line 91 of SortShuffleWriter.scala with a timer. The only
downside to this second approach is that it will also include the cleanup of
the partition writers. I'm not sure whether that should be included in this
time computation.
> Time to cleanup spilled shuffle files not included in shuffle write time
> ------------------------------------------------------------------------
>
> Key: SPARK-5845
> URL: https://issues.apache.org/jira/browse/SPARK-5845
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Affects Versions: 1.3.0, 1.2.1
> Reporter: Kay Ousterhout
> Assignee: Ilya Ganelin
> Priority: Minor
>
> When the disk is contended, I've observed cases when it takes as long as 7
> seconds to clean up all of the intermediate spill files for a shuffle (when
> using the sort based shuffle, but bypassing merging because there are <=200
> shuffle partitions). This is even when the shuffle data is non-huge (152MB
> written from one of the tasks where I observed this). This is effectively
> part of the shuffle write time (because it's a necessary side effect of
> writing data to disk) so should be added to the shuffle write time to
> facilitate debugging.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]