[
https://issues.apache.org/jira/browse/SPARK-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584774#comment-14584774
]
Josh Rosen commented on SPARK-5581:
-----------------------------------
Now that I've thought about this a bit more, we might want to roll this issue
into a slightly broader redesign of the lower-level shuffle block writer
interfaces.
I think that the current interface has too many responsibilities: there are
places in our serialized shuffle code where we end up calling OutputStream
methods on BlockObjectWriter instead of calling the record-oriented interfaces.
In these cases, we end up passing in a dummy Serializer to construct the
BlockObjectWriter, which is messy. This suggests that the current interface
has too many responsibilities and should be split into a lower-level interface
that deals with bytes and a higher-level interface that handles record streams
/ serialization.
When performing any cleanup / splitting / refactoring of this interface, it
would be nice to rewrite this class in Java so that we can benefit from checked
exceptions.
> When writing sorted map output file, avoid open / close between each partition
> ------------------------------------------------------------------------------
>
> Key: SPARK-5581
> URL: https://issues.apache.org/jira/browse/SPARK-5581
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle
> Affects Versions: 1.3.0
> Reporter: Sandy Ryza
>
> {code}
> // Bypassing merge-sort; get an iterator by partition and just write
> everything directly.
> for ((id, elements) <- this.partitionedIterator) {
> if (elements.hasNext) {
> val writer = blockManager.getDiskWriter(
> blockId, outputFile, ser, fileBufferSize,
> context.taskMetrics.shuffleWriteMetrics.get)
> for (elem <- elements) {
> writer.write(elem)
> }
> writer.commitAndClose()
> val segment = writer.fileSegment()
> lengths(id) = segment.length
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]