On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu
wrote:
Now that we got talking about searching in arrays, allow me to
also share an idea I've had a short while ago.
(Again, we're in the "I'd prefer to use an array if at all
possible" mindset. So let's see how we can help searching an
array with as little work as possible.)
One well-known search strategy is "Bring to front" (described
by Knuth in TAoCP). A BtF-organized linear data structure is
searched with the classic linear algorithm. The difference is
what happens after the search: whenever the search is
successful, the found element is brought to the front of the
structure. If we're looking most often for a handful of
elements, in time these will be near the front of the searched
structure.
For a linked list, bringing an element to the front is O(1)
(just rewire the pointers). For an array, things are not so
pleasant - rotating the found element to the front of the array
is O(n).
So let's see how we can implement a successful BtF for arrays.
The first idea is to just swap the found element with the first
element of the array. That's O(1) but has many disadvantages -
if you search e.g. for two elements, they'll compete for the
front of the array and they'll go back and forth without making
progress.
Another idea is to just swap the found element with the one
just before it. The logic is, each successful find will shift
the element closer to the front, in a bubble sort manner. In
time, the frequently searched elements will slowly creep toward
the front. The resulting performance is not appealing - you
need O(n) searches to bring a given element to the front, for a
total of O(n * n) steps spent in the n searches. Meh.
So let's improve on that: whenever an element is found in
position k, pick a random number i in the range 0, 1, 2, ..., k
inclusive. Then swap the array elements at indexes i and k.
This is the Randomized Slide to Front strategy.
With RStF, worst case search time remains O(n), as is the
unsuccessful search. However, frequently searched elements
migrate quickly to the front - it only takes O(log n) searches
to bring a given value at the front of the array.
Insertion and removal are both a sweet O(1), owing to the light
structuring: to insert just append the element (and perhaps
swap it in a random position of the array to prime searching
for it). Removal by position simply swaps the last element into
the position to be removed and then reduces the size of the
array.
So the RStF is suitable in all cases where BtF would be
recommended, but allows an array layout without considerable
penalty.
Related work: Theodoulos Garefalakis' Master's thesis "A Family
of Randomized Algorithms for List Accessing" describes Markov
Move to Front, which brings the searched element to front
according to a Markov chain schedule; and also Randomized Move
to Front, which decides whether a found element is brought to
front depending on a random choice. These approaches are
similar in that they both use randomization, but different
because neither has good complexity on array storage.
Andrei
What is the advantage compared to let's say a ringbuffer ? On
find you can put the element to the front, and swap the old
element with the new element ?
I guess randomizing would avoid hitting pathological cases too
often, but would converge more slowly ?