[
https://issues.apache.org/jira/browse/SPARK-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539067#comment-14539067
]
Josh Rosen commented on SPARK-7413:
-----------------------------------
Actually, it looks like we sort-of try to do this in this very confusing block
of code at the end of writePartitionedFile:
{code}
context.taskMetrics.incMemoryBytesSpilled(memoryBytesSpilled)
context.taskMetrics.incDiskBytesSpilled(diskBytesSpilled)
context.taskMetrics.shuffleWriteMetrics.filter(_ =>
bypassMergeSort).foreach { m =>
if (curWriteMetrics != null) {
m.incShuffleBytesWritten(curWriteMetrics.shuffleBytesWritten)
m.incShuffleWriteTime(curWriteMetrics.shuffleWriteTime)
m.incShuffleRecordsWritten(curWriteMetrics.shuffleRecordsWritten)
}
}
lengths
}
{code}
In spillToPartitionFiles, it looks like curWriteMetrics only has one value, so
we do actually capture the proper write metrics. In spillToMergeableFile,
curWriteMetrics is re-assigned a bunch of times but its value doesn't seem to
be read anywhere, which makes it seem like we might not be properly counting
metrics for that path.
It's possible that the current code might be correct and that I'm just
misinterpreting it, but I find the current code to be extremely convoluted and
hard to understand. We should strongly consider writing proper tests for this
and refactoring it early in 1.5.
> Time to write shuffle spill files is not captured in ShuffleWriteMetrics
> ------------------------------------------------------------------------
>
> Key: SPARK-7413
> URL: https://issues.apache.org/jira/browse/SPARK-7413
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Reporter: Josh Rosen
>
> In ExternalSorter's {{spillToMergeableFile()}} method, we pass
> ShuffleWriteMetrics instances to the disk writers, but discard the
> {{shuffleWriteTime}} metrics captured here. I think that we should account
> for this IO time, possibly by introducing new metrics to distinguish time
> spent writing spills vs. writing final shuffle output and extending the UI to
> break down the overall IO write time in terms of these two components.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]