Okasaki's book is a continued inspiration of data structures and algorithms.

I was thinking along the following lines: typical collections are searchable in linear time. Then we have highly structured collections that feature logarithmic search time. But there seems to be nothing in between. So I was thinking, what would be a data structure that allows O(sqrt n) search times?

After a little tinkering, here's what I came up with.

Start with a simple array of data. Then mentally decompose that array into a concatenation of smaller arrays: first has size 1, second has size 4, third has size 9, then 16, 25, 36, ... Generally the size of these imaginary subarrays grows quadratically. And they're adjacent to each other. The last array may be incomplete.

Example: we decompose an array of size 35 into: an array of size 1 followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete fragment of an array of size 25).

Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap, followed by the next 9 smallest organized in a max heap, ... etc. There are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.

That's the layout! Now, to search we look at one heap at a time. Whenever the maximum element (first element in the subarray) is smaller than the searched value, we skip over that entire heap and go to the next one. In the worst case we'll skip O(sqrt n) heaps. When the max value in a heap is less than the searched element, we found the heap and we run a linear search among O(sqrt n) elements.

The structure is nice for sorting and in particular parallel sorting because each subarray can be sorted independently - there's no migration into or out of each subarray. Inside each subarray, of course heapsort would be a good choice because it takes advantage of the existing max heap.

I haven't worked out the math for insertion and deletion yet, but they seem to also be around O(sqrt n) or O(log(n) * sqrt(n)). Just wanted to share with you and discuss what seems to be an interesting data structure.

Please share any thoughts!


Andrei

Reply via email to