[ 
https://issues.apache.org/jira/browse/TEZ-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641135#comment-14641135
 ] 

Saikat edited comment on TEZ-2643 at 7/24/15 9:56 PM:
------------------------------------------------------

[~rajesh.balamohan] the bug in TEZ-2602 occured in trying to optimize the 
number of empty spills.(missed that testcase scenrio!)
I think the correct place to put the check is in merger.ready() and spill() 
method.
The idea is if the merger heap is empty then we know that the spill will be 
empty and hence ignore that spill.

For example:
without this patch testKVExceedsBuffer() spills out 9 files. with this patch, 
pipelined sorter spills only 2.




was (Author: saikatr):
[~rajesh.balamohan] the bug in TEZ-2602 occured in trying to optimize the 
number of empty spills.(missed that testcase scenrio!)
I think the correct place to put the check is in merger.ready() and spill() 
method.
The idea is if the merger heap is empty then we know that the spill will be 
empty and hence ignore that spill.

For example:
without this patch testKVExceedsBuffer() spills out 9 files. with this patch, 
pipelined sorter spills only 2.



> Minimize number of empty spills in Pipelined Sorter
> ---------------------------------------------------
>
>                 Key: TEZ-2643
>                 URL: https://issues.apache.org/jira/browse/TEZ-2643
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Saikat
>         Attachments: TEZ-2643.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to