[ 
https://issues.apache.org/jira/browse/TEZ-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734071#comment-14734071
 ] 

Rajesh Balamohan commented on TEZ-2643:
---------------------------------------


Sorry about the delay [~saikatr].  Patch optimizes for the cases when SpanHeap 
size is 0 and avoids creating empty files. Minor comments
- Rename ignoreSpillIfNeeded to ignoreEmptySpills?
- Should sendPipelinedShuffleEvents be moved from sort to spill? If so, spill 
does not need to return any flag.
{noformat}
     if (pipelinedShuffle) {
        sendPipelinedShuffleEvents();
      }
{noformat}
- In spill(), should spillRec / filename creation, adding to spillFilePaths be 
moved after ignoreSpillIfNeeded check?
{noformat}
          // create spill file
      final long size = capacity +
          + (partitions * APPROX_HEADER_LENGTH);
      final TezSpillRecord spillRec = new TezSpillRecord(partitions);
      final Path filename =
          mapOutputFile.getSpillFileForWrite(numSpills, size);
      spillFilePaths.put(numSpills, filename);
{noformat}

> Minimize number of empty spills in Pipelined Sorter
> ---------------------------------------------------
>
>                 Key: TEZ-2643
>                 URL: https://issues.apache.org/jira/browse/TEZ-2643
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Saikat
>            Assignee: Saikat
>         Attachments: TEZ-2643.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to