[ https://issues.apache.org/jira/browse/SPARK-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187538#comment-15187538 ]
Josh Rosen commented on SPARK-5581: ----------------------------------- As long as we're not fsyncing then hopefully the number of partitions won't matter as much since all of the writes will be going to the same file, right? > When writing sorted map output file, avoid open / close between each partition > ------------------------------------------------------------------------------ > > Key: SPARK-5581 > URL: https://issues.apache.org/jira/browse/SPARK-5581 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 1.3.0 > Reporter: Sandy Ryza > > {code} > // Bypassing merge-sort; get an iterator by partition and just write > everything directly. > for ((id, elements) <- this.partitionedIterator) { > if (elements.hasNext) { > val writer = blockManager.getDiskWriter( > blockId, outputFile, ser, fileBufferSize, > context.taskMetrics.shuffleWriteMetrics.get) > for (elem <- elements) { > writer.write(elem) > } > writer.commitAndClose() > val segment = writer.fileSegment() > lengths(id) = segment.length > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org