[ 
https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291917#comment-15291917
 ] 

Bikas Saha commented on TEZ-2950:
---------------------------------

bq. 2. Rely on pipelined shuffle to avoid the final merge.
Per old discussion with [~rajesh.balamohan] avoiding final merge is independent 
of pipeline shuffle and could be enabled without it (this needs code change 
though). Perhaps what you allude to in 4.

> Poor performance of UnorderedPartitionedKVWriter
> ------------------------------------------------
>
>                 Key: TEZ-2950
>                 URL: https://issues.apache.org/jira/browse/TEZ-2950
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Kuhu Shukla
>         Attachments: TEZ-2950.001_prelim.patch
>
>
> Came across a job which was taking a long time in 
> UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data 
> from spill files (8500 spills) and then writing the final compressed merge 
> file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not 
> just buffer and keep directly writing to the final file which will save a lot 
> of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to