An efficient persistent, immutable deque for Clojure with amortized O(1) peek and pop at both ends

Ken Wesson Sun, 23 Jan 2011 02:24:50 -0800

I got to thinking how one might implement a persistent, immutable
deque that was efficient.


At first I considered modifying my queue implementation. Peeking at
both ends could easily be done in constant time, and popping from the
left. Popping from the right presented a problem: when it ate the
whole vector it would have to hit the seq, so you'd have to move the
seq to the right, veccing it, and replace the left chunk with nil.
This would be O(n) in the current length of the queue. It would also
only happen sometimes, but a worst case access pattern where
right-pops are done until the vector is gone and then right-pop,
left-pop, right-pop, etc. was done would be O(n) overall, as the
vector would keep being seqd, nexted, vecced, popped, seqd, nexted,
vecced, popped, etc. -- it's easy to see the time taken to do this
with n items until all are gone is triangular in n/2, which makes it
quadratic in n, which makes the *per-item* cost linear in n.

So I thought, at first, "how do we make removing at both ends
efficient"? Trees suggested an answer, and I quickly came up with a
sorted-map-based implementation (but it is NOT the best I eventually
developed):

(defprotocol LPop
  (lpeek [this])
  (lpop [this])
  (lconj [this object]))

(deftype LogNDeque [content lowest highest metadata]
  clojure.lang.IObj
    (withMeta [this m] (LogNDeque. content lowest highest m))
    (meta [this] metadata)
  clojure.lang.IPersistentStack
    (cons [this object]
      (if content
        (let [h2 (inc highest)]
          (LogNDeque. (assoc content h2 object) lowest h2 metadata))
        (LogNDeque. (sorted-map 0 object) 0 0 metadata)))
    (count [this] (if content (inc (- highest lowest)) 0))
    (empty [this] (LogNDeque. nil 0 0 metadata))
    (equiv [this other] (= (seq this) other))
    (seq [this] (if content (seq (vals content))))
    (peek [this] (if content (content highest)))
    (pop [this]
      (if content
        (if (= lowest highest)
          (empty this)
          (LogNDeque. (dissoc content highest) lowest (dec highest) metadata))
        (throw (IllegalStateException.
                 "Can't pop empty deque"))))
  LPop
    (lpeek [this] (if content (content lowest)))
    (lpop [this]
      (if content
        (if (= lowest highest)
          (empty this)
          (LogNDeque. (dissoc content lowest) (inc lowest) highest metadata))
        (throw (IllegalStateException.
                 "Can't pop empty deque"))))
    (lconj [this object]
      (if content
        (let [l2 (dec lowest)]
          (LogNDeque. (assoc content l2 object) l2 highest metadata))
        (LogNDeque. (sorted-map 0 object) 0 0 metadata)))
  java.lang.Object
    (toString [this] (if content (str (seq this)) "()"))

(defn deque [& things]
  (apply conj (LogNDeque. nil 0 0 {}) things))

Obviously, this has logarithmic performance in general, and for large
deques could be up to 32x slower than ideal. Furthermore, there's a
problem with it and Clojure 1.3 alpha: if you used it as a queue, so
mainly popped at one end and conjed at the other, the numeric indices
used internally would crawl towards Long/MAX_VALUE or Long/MIN_VALUE
and then it would blow up. On 1.2 and earlier it would simply slow
down drastically when it started to use bignum arithmetic.

Wondering if there was a way to improve on that I revisited my earlier
queue implementation.

I found a bug or two and fixed them:

(deftype Queue [chunk1 chunk2 size metadata]
  clojure.lang.IObj
    (withMeta [this m] (Queue. chunk1 chunk2 size m))
    (meta [this] metadata)
  clojure.lang.IPersistentStack
    (cons [this object] (Queue. chunk1 (conj chunk2 object) (inc size)
metadata))
    (count [this] size)
    (empty [this] (Queue. nil [] 0 metadata))
    (equiv [this other] (= (seq this) other))
    (seq [this] (seq (lazy-cat chunk1 chunk2)))
    (peek [this] (if chunk1 (first chunk1) (first chunk2)))
    (pop [this]
      (if chunk1
        (Queue. (next chunk1) chunk2 (dec size) metadata)
        (if-let [nc1 (seq chunk2)]
          (Queue. (next nc1) [] (dec size) metadata)
          (throw (IllegalStateException.
                   "Can't pop empty queue")))))
  java.lang.Object
    (toString [this] (if (zero? size) "()" (str (seq this)))))

(defn queue [& things]
  (Queue. (seq things) [] (count things) {}))

and thought about whether there was a way to make it pop efficiently
from the other end after all. I was thinking of things like how
java.util.ArrayList will sometimes have to copy and double the size of
an internal array, which is O(n), but also because it doubles the size
it has to do so at ever diminishing intervals, so in the worst case of
constantly adding on the end without ever removing it does one add,
then one add and doubles, then two adds and doubles, four adds and
doubles, eight adds and doubles, etc.; the time taken in a simplistic
model where it takes 1 to add without doubling and 1 per element to
double, is 1 for the first add, 2 for the next, 1 for the next two and
then 4 for the next, etc.; the elements from 2^n to 2^(n+1) cost 2^n
for the individual adds plus another 2^(n+1) for the copying of the
whole vector = 3x2^n. So, amortized constant time, with (given the
simplistic model) a factor-of-3 overhead.

Adding lpop to the queue above has it work efficiently much of the
time but copy a vector occasionally. Could something be done to
guarantee amortized constant time, by getting rid of that worst-case
behavior with alternating left and right pops?

It turns out something could:

(defprotocol LPop
  (lpeek [this])
  (lpop [this])
  (lconj [this object]))

(deftype Deque [chunk1 chunk2 size metadata]
  clojure.lang.IObj
    (withMeta [this m] (Deque. chunk1 chunk2 size m))
    (meta [this] metadata)
  clojure.lang.IPersistentStack
    (cons [this object] (Deque. chunk1 (conj chunk2 object) (inc size)
metadata))
    (count [this] size)
    (empty [this] (Deque. nil [] 0 metadata))
    (equiv [this other] (= (seq this) other))
    (seq [this] (seq (lazy-cat chunk1 chunk2)))
    (peek [this] (peek chunk2))
    (pop [this]
      (if (seq chunk2)
        (let [nc2 (pop chunk2)]
          (if (seq nc2)
            (Deque. chunk1 nc2 (dec size) metadata)
            (if chunk1
              (let [split-idx (quot size 2)]
                (Deque.
                  (seq (take split-idx chunk1))
                  (vec (drop split-idx chunk1))
                  (dec size)
                  metadata))
              (Deque. nil [] 0 metadata))))
        (throw (IllegalStateException.
               "Can't pop empty deque"))))
  LPop
    (lpeek [this] (if chunk1 (first chunk1) (first chunk2)))
    (lpop [this]
      (if chunk1
        (Deque. (next chunk1) chunk2 (dec size) metadata)
        (if-let [nc1 (seq chunk2)]
          (pop (Deque. (next nc1) ['dummy] (dec size) metadata))
          (throw (IllegalStateException.
                   "Can't pop empty queue")))))
    (lconj [this object]
      (Deque. (cons object chunk1) chunk2 (inc size) metadata))
  java.lang.Object
    (toString [this] (if (zero? size) "()" (str (seq this)))))

(defn deque [& things]
  (let [v (vec things)]
    (Deque. nil v (count v) {})))

Whenever a pop has to vec the left chunk, it does only HALF of it. It
still has to walk the whole deque, but it keeps half as a seq as the
new left chunk and copies the other half to a vector that becomes the
new right chunk. The minimum number of additional pops until this
procedure must be repeated is about n/2 since that's roughly the size
of the new right chunk. So, the O(n) copy is done roughly every n/2
operations in the worst case, for O(3n)ish performance -- same as for
ArrayList expansion.

Here's the torture-test:

user=> (nth (iterate #(lpop (pop %)) (apply deque (range 50))) 18)
#<Deque (18 19 20 21 22 23 24 25 26 27 28 29 30 31)>

So far, so good. That performs that worst-case access pattern and it
gets the correct result.

user=> (time (do (nth (iterate #(lpop (pop %)) (apply deque (range
5000))) 2400) nil))
"Elapsed time: 4.39072 msecs"

This is after having run the loop a few times to let JIT do its thing.
That's non-horrible for five thousand items.

user=> (time (do (nth (iterate #(lpop (pop %)) (apply deque (range
50000))) 24000) nil))
"Elapsed time: 41.92676 msecs"

Increasing the size by a factor of ten increases the job by a factor
of about ten and a lot less than 100.

user=> (time (do (nth (iterate #(lpop (pop %)) (apply deque (range
500000))) 240000) nil))
"Elapsed time: 485.64368 msecs"

Ditto. Obviously not quadratic behavior.

And it seems to work correctly for the edge case where the whole deque
is very small when pop is called.

One thing you may notice is that it splits chunk1 anytime chunk2 would
become empty and chunk1 is not empty, rather than wait until pop is
called with an empty chunk2. Indeed, if lpop exhausts chunk1 it
doesn't just make the new chunk1 (next chunk2) and the new chunk2 an
empty vector; it sticks a dummy object into the new chunk2 vector and
pops it right away to force chunk1 to be split.

The reason for this is, actually, peek: (peek [this] (peek chunk2)).

If it weren't invariant that if chunk2 was empty the deque was empty
this would have to be, instead,

(peek [this] (if (seq (chunk2)) (peek chunk2) (last chunk1)))

and would be O(n) whenever chunk2 was empty. A worst-case access
pattern of (get deque into a state where chunk2 is empty) (peek deque)
(peek deque) (peek deque)... would have amortized O(n). Instead, both
peek and lpeek are O(1) and both pop and lpop are amortized O(1) --
O(n) if the chunk being popped from empties and O(1) otherwise, with
the chunk-splitting ensuring that it averages to O(1) for arbitrary
long chains of pops and lpops:

user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop
%)) (apply deque (range 500000))) 480000) nil))
"Elapsed time: 628.7856 msecs"
nil

The logarithmic performance of the LogNDeque is noticeably worse:

user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop
%)) (apply conj (LogNDeque. nil 0 0 {}) (range 500000))) 480000) nil))
"Elapsed time: 4413.452321 msecs"

This remains the case even if the time needed to construct the deque
is factored out:

user=> (def d (apply deque (range 500000)))
#'user/d
user=> (def lnd (apply conj (LogNDeque. nil 0 0 {}) (range 500000)))
#'user/lnd

[let numbers settle down]

user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop
%)) d) 480000) nil))
"Elapsed time: 383.90944 msecs"
nil

[let numbers settle down]

user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop
%)) lnd) 480000) nil))
"Elapsed time: 1253.61784 msecs"
nil

For that size of input the LogNDeque is more than three times slower.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

An efficient persistent, immutable deque for Clojure with amortized O(1) peek and pop at both ends

Reply via email to