Re: sortUniq

Xinok via Digitalmars-d Thu, 22 Jan 2015 17:00:33 -0800

On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescuwrote:

There's this classic patter on Unix: |sort|uniq, i.e. sort somedata and only display the unique elements.
What would be a better integrated version - one that doessorting and uniq in one shot? I suspect the combination couldbe quite a bit better than doing the two in sequence.
A few google searches didn't yield much. Ideas?


Thanks,

Andrei

My thought is that uniq on a sorted list is only an O(n)operation, so it's not an expensive operation by any means. Ifthere's to be any benefit to a sortUniq, it has to be capable ofreducing work incurred during the sorting process; else you'rejust going to end up with something less efficient.

One solution may be RedBlackTree, which has an option to disallowduplicate elements. This container has three useful properties:

(1) The tree grows one element at a time. This is in oppositionto other algorithms like quicksort or heapsort in which you mustoperate on the entire set of elements at once.

(2) Removing duplicates only requires a single comparison perelement, thus retaining the worst-case of |sort|uniq.

(3) Duplicate elements are removed immediately. Inserting anelement into a balanced tree with N elements is an O(lg n)operation, so the smaller n is, the less work that is required.

The shortcoming of RedBlackTree is that it's not very fast.However, I don't know any other O(n lg n) sorting algorithm whichhas these properties.

Re: sortUniq

Reply via email to