Re: An interesting data structure with search time O(sqrt n)

Timon Gehr via Digitalmars-d Wed, 02 Dec 2015 03:32:15 -0800

On 12/01/2015 11:48 PM, Andrei Alexandrescu wrote:

On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:

On 11/30/15 9:47 PM, Timon Gehr wrote:

On 12/01/2015 03:33 AM, Timon Gehr wrote:

On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:

So now consider my square heaps. We have O(n) build time (just a bunch
of heapifications) and O(sqrt n) search.


How do you build in O(n)? (The initial array is assumed to be
completely
unordered, afaict.)


(I meant to say: There aren't any assumptions on the initial ordering of
the array elements.)


That's quite challenging. (My O(n) estimate was off the cuff and
possibly wrong.) Creating the structure entails simultaneously solving
the selection problem (find the k smallest elements) for several values
of k. I'll post here if I come up with something. -- Andrei


OK, I think I have an answer to this in the form of an efficient algorithm.

First off: sizes 1+3+5+7+... seem a great choice, I'll use that for the
initial implementation (thanks Titus!).

Second: the whole max heap is a red herring - min heap is just as good,
and in fact better. When doing the search just overshoot by one then go
back one heap to the left and do the final linear search in there.

So the structure we're looking at is an array of adjacent min-heaps of
sizes 1, 3, 5, etc. The heaps are ordered (the maximum of heap k is less
than or equal to the minimum of heap k+1). Question is how do we build
such an array of heaps in place starting from an unstructured array of
size n.

One simple approach is to just sort the array in O(n log n). This
satisfies all properties - all adjacent subsequences are obviously
ordered, and any subsequence has the min heap property. As an
engineering approach we may as well stop here - sorting is a widely
studied and well implemented algorithm. However, we hope to get away
with less work because we don't quite need full sorting.

Here's the intuition: the collection of heaps can be seen as one large
heap that has a DAG structure (as opposed to a tree). In the DAG, the
root of heap k+1 is the child of all leaves of heap k (see
http://imgur.com/I366GYS which shows the DAG for the 1, 3, 7, and 7 heaps).

Clearly getting this structure to respect the heap property is all
that's needed for everything to work - so we simply apply the classic
heapify algorithm to it. It seems it can be applied almost unchanged -
starting from the end, sift each element down the DAG.

This looks efficient and minimal; I doubt there's any redundant work.
However, getting bounds for complexity of this will be tricky. Classic
heapify is tricky, too - it seems to have complexity O(n log n) but in
fact has complexity O(n) - see nice discussion at
http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity.
When applying heapify to the DAG, there's more restrictions and the
paths are longer, so a sliver more than O(n) is expected.

Anyway, this looks ready for a preliminary implementation and some more
serious calculations.

Assume the initial array is sorted from largest to smallest. This willbe the worst case for this construction algorithm, as each element willbe sifted all the way down to leaf level of the last heap.

For simplicity, let's assume that n = m² for m odd, such that the sizesof the heaps are 1,3,…,m. To sift some element down one entire heap ofsize i takes ≥ log₂(i) steps.

Ignoring the work performed in the heaps each elements starts in (whichis obviously at most O(n) in total), we can lower bound the performedwork; for the heap of size j, we sift j elements down all the followingheaps of sizes i=j+1,i=j+2,…,i=m:


∑ⱼ[j∈{1,3,…,m}]·j·∑ᵢ[j+1≤i≤m]·log(i)
=
∑ᵢ∑ⱼ[j∈{1,3,…,m}]·[j+1≤i≤m]·j·log(i)
=
∑ᵢ[2≤i≤m]·log(i)·∑ⱼ[j∈{1,3,…,m}]·[j+1≤i] j
=
∑ᵢ[2≤i≤m]·log(i)·∑ⱼ∑ₖ[j=2·k+1]·[1≤j≤m]·[j≤i-1] j
=
∑ᵢ[2≤i≤m]·log(i)·∑ⱼ∑ₖ[j=2·k+1]·[1≤2·k+1≤i-1]·(2·k+1)
=
∑ᵢ[2≤i≤m]·log(i)·∑ₖ[1≤2·k+1≤i-1]·(2·k+1)
=
∑ᵢ[2≤i≤m]·log(i)·∑ₖ[0≤k≤⌊i/2⌋-1]·(2·k+1)
=
∑ᵢ[2≤i≤m]·log(i)·⌊i/2⌋²
≥
Ω(m³)
=
Ω(n^(3/2)).

I.e. sorting is asymptotically faster, and probably also in practice.Moreover, the construction algorithm would be slower even if siftingdown one entire heap would run in constant time.

However, for your adapted data structure with min-heaps, implementinginsert and remove in O(√n̅ log n) is now easy using your "sifting in aDAG" approach.

One way to prove a useful lower bound on the time it must take to buildmight be to observe that building recursively can be used for sorting.

One more interesting thing: the heap heads are sorted, so when
searching, the heap corresponding to the searched item can be found
using binary search. That makes that part of the search essentially
negligible - the lion's share will be the linear search on the last
mile. In turn, that suggests that more heaps that are smaller would be a
better choice. (At an extreme, if we have an array of heaps each
proportional to log(n), then we get search time O(log n) even though the
array is not entirely sorted.)


Andrei

The linear search can have way better locality, so it is likely thatactually the binary search dominates for some data sets.

Re: An interesting data structure with search time O(sqrt n)

Reply via email to