[
https://issues.apache.org/jira/browse/TEZ-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641135#comment-14641135
]
Saikat edited comment on TEZ-2643 at 7/27/15 8:33 PM:
------------------------------------------------------
[~rajesh.balamohan] the bug in TEZ-2602 occured in trying to optimize the
number of empty spills.(missed that testcase scenrio!)
I think the correct place to put the check is in merger.ready() and spill()
method.
The idea is if the merger heap is empty then we know that the spill will be
empty and hence ignore that spill.
For example:
without this patch testKVExceedsBuffer() spills out 9 files. with this patch,
pipelined sorter spills only 2, (which is a 4x improvement in worst case
scenario where all KVs are larger than alloted buffer to sorter)
The patch also passes all the testcases in pipelinedsorter.
was (Author: saikatr):
[~rajesh.balamohan] the bug in TEZ-2602 occured in trying to optimize the
number of empty spills.(missed that testcase scenrio!)
I think the correct place to put the check is in merger.ready() and spill()
method.
The idea is if the merger heap is empty then we know that the spill will be
empty and hence ignore that spill.
For example:
without this patch testKVExceedsBuffer() spills out 9 files. with this patch,
pipelined sorter spills only 2.
The patch also passes all the testcases in pipelinedsorter.
> Minimize number of empty spills in Pipelined Sorter
> ---------------------------------------------------
>
> Key: TEZ-2643
> URL: https://issues.apache.org/jira/browse/TEZ-2643
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Saikat
> Assignee: Saikat
> Attachments: TEZ-2643.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)