[
https://issues.apache.org/jira/browse/IMPALA-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-3357.
-----------------------------------
Resolution: Later
Doesn't seem that promising based on Norbert's comments, so let's close for now.
> Sorter's quicksort implementation is very suboptimal for duplicate keys
> -----------------------------------------------------------------------
>
> Key: IMPALA-3357
> URL: https://issues.apache.org/jira/browse/IMPALA-3357
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.6.0
> Reporter: Tim Armstrong
> Assignee: Norbert Luksa
> Priority: Minor
> Labels: perf, ramp-up
>
> {code}
> void Sorter::TupleSorter::SortHelper(TupleIterator first, TupleIterator last)
> {
> if (UNLIKELY(state_->is_cancelled())) return;
> // Use insertion sort for smaller sequences.
> while (last.index_ - first.index_ > INSERTION_THRESHOLD) {
> TupleIterator iter(this, first.index_ + (last.index_ - first.index_) / 2);
> DCHECK(iter.current_tuple_ != NULL);
> // Partition() splits the tuples in [first, last) into two groups (<=
> pivot
> // and >= pivot) in-place. 'cut' is the index of the first tuple in the
> second group.
> TupleIterator cut = Partition(first, last,
> reinterpret_cast<Tuple*>(iter.current_tuple_));
> SortHelper(cut, last);
> last = cut;
> if (UNLIKELY(state_->is_cancelled())) return;
> }
> InsertionSort(first, last);
> }
> {code}
> The quicksort implementation in the sorter is based on dividing the input
> into two partitions: <= pivot and >= pivot.
> If all of the input values in a partition are equal, then it will still
> recursively divide and do insertion sort on the values. We could change the
> sorter to partition the input into three partitions: <, == and >. Then it
> doesn't need to recurse on the middle partition. This would mean it could
> sort a partition full of duplicate values in a single pass over the input.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]