On Fri, Jan 23, 2015 at 12:59:51AM +0000, Xinok via Digitalmars-d wrote: > On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote: > >There's this classic patter on Unix: |sort|uniq, i.e. sort some data > >and only display the unique elements. > > > >What would be a better integrated version - one that does sorting and > >uniq in one shot? I suspect the combination could be quite a bit > >better than doing the two in sequence. [...] > My thought is that uniq on a sorted list is only an O(n) operation, so > it's not an expensive operation by any means. If there's to be any > benefit to a sortUniq, it has to be capable of reducing work incurred > during the sorting process; else you're just going to end up with > something less efficient. > > One solution may be RedBlackTree, which has an option to disallow > duplicate elements. This container has three useful properties:
The problem with RedBlackTree is that it's cache-unfriendly due to the memory allocation and per-node pointers. One reason quicksort performs so well is because it works in-place, and the partition operation accesses elements bilinearly (i.e., two linear traversals over the same stretch of memory interleaved with each other), so it's very likely that most of the accesses will hit the cache (possibly even all, if hardware prefetching is present). Whereas with RedBlackTree, it's jumping all over memory, so there will be a higher rate of cache misses and the resulting slow RAM fetches. Plus, memory allocation is slooow, and is best avoided where possible. T -- What's a "hot crossed bun"? An angry rabbit.
