Re: An interesting data structure with search time O(sqrt n)

Timon Gehr via Digitalmars-d Thu, 03 Dec 2015 17:42:13 -0800

On 12/03/2015 11:44 PM, Andrei Alexandrescu wrote:

On 12/02/2015 06:26 AM, Timon Gehr wrote:

Assume the initial array is sorted from largest to smallest. This will
be the worst case for this construction algorithm, as each element will
be sifted all the way down to leaf level of the last heap.

[...]

Ω(n^(3/2)).


Thanks. My math-fu is not good enough to properly follow the argument,

There was actually an (asymptotically inconsequential) error in theinitial expression. Fixed version:

Ignoring the work performed in the heaps each elements starts in (whichis obviously at most O(n)), we can lower bound the performed work; forthe heap of size j, we sift j elements down all the following heaps ofsizes i=j+2,i=j+4,…,i=m).



∑ⱼ[j∈{1,3,…,m}]·j·∑ᵢ[i∈{j+2,j+4,…,m}]·log(i)
=
∑ᵢ∑ⱼ[j∈{1,3,…,m}]·[i∈{j+2,j+4,…,m}]·j·log(i)
=
∑ᵢ∑ⱼ∑ₖ∑ₗ[i=2·k+1]·[j=2·l+1]·[1≤j≤m]·[j+2≤i≤m]·j·log(i)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ⱼ∑ₗ[j=2·l+1]·[1≤j≤m]·[j+2≤i]·j
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ⱼ∑ₗ[j=2·l+1]·[1≤j≤i-2]·j
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ⱼ∑ₗ[j=2·l+1]·[1≤2·l+1≤i-2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ₗ[1≤2·l+1≤i-2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ₗ[0≤l≤(i-3)/2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ₗ[0≤l≤(i-3)/2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·((i-1)/2)²
=
∑ₖ[3≤2·k+1≤m]·log(2·k+1)·k²
=
∑ₖ[1≤k≤(m-1)/2]·log(2·k+1)·k²
≥
Ω(m³)
=
Ω(n^(3/2)).



Less formally:

The heap of size 3 is sifted through by 1=1² element from the first heap.

The heap of size 5 is sifted through by 1+3=2² elements from the firstand second heapsThe heap of size 7 is sifted through by 1+3+5=3² elements from the firstthree heaps.

...
The heap of size 2·k+1 is sifted through by k² elements.
...
The heap of size m is sifted through by ((m-1)/2)² elements.

This gives the last sum in the above derivation:

     sifting through heap of size 2·k+1 costs at least log(2·k+1)
                                v
        ∑ₖ[1≤k≤(m-1)/2]·log(2·k+1)·k²
             ^
k ranges from 1 to (m-1)/2.

The factor k² is the number of elements sifting through the heap of size2·k+1.


The sum can be lower bounded by
1+2²+3²+…+((m-1)/2)²
=
(m+1)·(m-1)·(m-3)/24+(m+1)·(m-1)/8
=
(m³-m)/24
≥
Ω(m³).

but that bound sounds high; one intuition is that there shouldn't be
more work done than in sorting - fewer swaps, fewer comparisons, less
resulting structure.
...

That intuition is spot on, but the sorting algorithm that theconstruction algorithm imitates most closely is a version of insertion sort.

For example, heapifying the array 1, 2, 3, 4, 5, 6, 7, 8, 9 in 1+3+5 max
heaps results in the array 9, 8, 6, 7, 5, 4, 3, 2, 1 which is "less
sorted" than the fully sorted array. (I know, that doesn't prove anything.)
...

It provides motivation to try and find another construction algorithmthat runs asymptotically faster than n log n, or to prove that n log nis asymptotically optimal.

At this point I need to either get to the bottom of the math or

The basic argument is not too involved: If each element needs to besifted all the way down to the last heap, then for each of √n̅ heaps, youneed to sift all its ~√n̅ elements down ~√n̅ other heaps, which requiresmore than ≈n^(3/2) operations. The above derivation makes this rigorous.

put together an experimental rig that counts comparisons and swaps.

(BTW I figured one reason for which these DAG heaps work at all is that
there's never a need to sift an element up between heaps; that would be
inefficient because the root of a subheap has multiple parents. Sifting
down is efficient because each node has at most two children.)
...


Good point. I had declared victory on the "insert" front too early.

Re: An interesting data structure with search time O(sqrt n)

Reply via email to