On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote:
There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements.

What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence.

A few google searches didn't yield much. Ideas?


Thanks,

Andrei

My thought is that uniq on a sorted list is only an O(n) operation, so it's not an expensive operation by any means. If there's to be any benefit to a sortUniq, it has to be capable of reducing work incurred during the sorting process; else you're just going to end up with something less efficient.

One solution may be RedBlackTree, which has an option to disallow duplicate elements. This container has three useful properties:

(1) The tree grows one element at a time. This is in opposition to other algorithms like quicksort or heapsort in which you must operate on the entire set of elements at once.

(2) Removing duplicates only requires a single comparison per element, thus retaining the worst-case of |sort|uniq.

(3) Duplicate elements are removed immediately. Inserting an element into a balanced tree with N elements is an O(lg n) operation, so the smaller n is, the less work that is required.

The shortcoming of RedBlackTree is that it's not very fast. However, I don't know any other O(n lg n) sorting algorithm which has these properties.

Reply via email to