Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-3354: bad sorter pivot selection on some inputs ......................................................................
IMPALA-3354: bad sorter pivot selection on some inputs Switch to a median of three random tuples that should be very robust to a range of inputs. It may be slightly worse than the existing pivot selection on some inputs where the original algorithm is close to optimal (e.g. already sorted inputs), but should be typically better overall. Always always recurse on the smaller partition: this prevent the stack overflow even with bad pivot selection. The overhead is minimal - in profiles for small sorts I'm seeing pivot selection take at most 0.5% of CPU time. The improved pivot selections gives modest improvements of 2-5% on the targeted perf order by benchmarks on a single node run with TPC-H scale factor 20. Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452 --- M be/src/runtime/sorter.cc M tests/query_test/test_sort.py 2 files changed, 127 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/24/2824/4 -- To view, visit http://gerrit.cloudera.org:8080/2824 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
