Ming Ma created TEZ-3577:
----------------------------

             Summary: DefaultSorter doesn't compute RLE properly
                 Key: TEZ-3577
                 URL: https://issues.apache.org/jira/browse/TEZ-3577
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Ming Ma


RLE is enabled if sameKeyCount is above certain threshold. However, 
sameKeyCount is computed during sorter.sort. Thus when the following function 
is invoked by flush for the only spill, the passed parameter sameKeyCount is 0 
given no sort has happened yet. After sorter.sort is called, 
DefaultSorter#sameKey is updated and should be used to pass to the spill 
function.

{noformat}
  protected void sortAndSpill(long sameKeyCount, long totalKeysCount)
      throws IOException, InterruptedException {
    final int mstart = getMetaStart();
    final int mend = getMetaEnd();
    sorter.sort(this, mstart, mend, progressable);
    spill(mstart, mend, sameKeyCount, totalKeysCount);
  }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to