[
https://issues.apache.org/jira/browse/SPARK-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335757#comment-14335757
]
Kay Ousterhout commented on SPARK-5845:
---------------------------------------
I'd go with (b) -- it's fine (and good, I think!) to include time to cleanup
the partition writers, since this also involves interacting with the disk.
Thanks!
> Time to cleanup spilled shuffle files not included in shuffle write time
> ------------------------------------------------------------------------
>
> Key: SPARK-5845
> URL: https://issues.apache.org/jira/browse/SPARK-5845
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Affects Versions: 1.3.0, 1.2.1
> Reporter: Kay Ousterhout
> Assignee: Ilya Ganelin
> Priority: Minor
>
> When the disk is contended, I've observed cases when it takes as long as 7
> seconds to clean up all of the intermediate spill files for a shuffle (when
> using the sort based shuffle, but bypassing merging because there are <=200
> shuffle partitions). This is even when the shuffle data is non-huge (152MB
> written from one of the tasks where I observed this). This is effectively
> part of the shuffle write time (because it's a necessary side effect of
> writing data to disk) so should be added to the shuffle write time to
> facilitate debugging.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]