Ming Ma created TEZ-3577:
----------------------------
Summary: DefaultSorter doesn't compute RLE properly
Key: TEZ-3577
URL: https://issues.apache.org/jira/browse/TEZ-3577
Project: Apache Tez
Issue Type: Bug
Reporter: Ming Ma
RLE is enabled if sameKeyCount is above certain threshold. However,
sameKeyCount is computed during sorter.sort. Thus when the following function
is invoked by flush for the only spill, the passed parameter sameKeyCount is 0
given no sort has happened yet. After sorter.sort is called,
DefaultSorter#sameKey is updated and should be used to pass to the spill
function.
{noformat}
protected void sortAndSpill(long sameKeyCount, long totalKeysCount)
throws IOException, InterruptedException {
final int mstart = getMetaStart();
final int mend = getMetaEnd();
sorter.sort(this, mstart, mend, progressable);
spill(mstart, mend, sameKeyCount, totalKeysCount);
}
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)