I got to thinking how one might implement a persistent, immutable deque that was efficient.
At first I considered modifying my queue implementation. Peeking at both ends could easily be done in constant time, and popping from the left. Popping from the right presented a problem: when it ate the whole vector it would have to hit the seq, so you'd have to move the seq to the right, veccing it, and replace the left chunk with nil. This would be O(n) in the current length of the queue. It would also only happen sometimes, but a worst case access pattern where right-pops are done until the vector is gone and then right-pop, left-pop, right-pop, etc. was done would be O(n) overall, as the vector would keep being seqd, nexted, vecced, popped, seqd, nexted, vecced, popped, etc. -- it's easy to see the time taken to do this with n items until all are gone is triangular in n/2, which makes it quadratic in n, which makes the *per-item* cost linear in n. So I thought, at first, "how do we make removing at both ends efficient"? Trees suggested an answer, and I quickly came up with a sorted-map-based implementation (but it is NOT the best I eventually developed): (defprotocol LPop (lpeek [this]) (lpop [this]) (lconj [this object])) (deftype LogNDeque [content lowest highest metadata] clojure.lang.IObj (withMeta [this m] (LogNDeque. content lowest highest m)) (meta [this] metadata) clojure.lang.IPersistentStack (cons [this object] (if content (let [h2 (inc highest)] (LogNDeque. (assoc content h2 object) lowest h2 metadata)) (LogNDeque. (sorted-map 0 object) 0 0 metadata))) (count [this] (if content (inc (- highest lowest)) 0)) (empty [this] (LogNDeque. nil 0 0 metadata)) (equiv [this other] (= (seq this) other)) (seq [this] (if content (seq (vals content)))) (peek [this] (if content (content highest))) (pop [this] (if content (if (= lowest highest) (empty this) (LogNDeque. (dissoc content highest) lowest (dec highest) metadata)) (throw (IllegalStateException. "Can't pop empty deque")))) LPop (lpeek [this] (if content (content lowest))) (lpop [this] (if content (if (= lowest highest) (empty this) (LogNDeque. (dissoc content lowest) (inc lowest) highest metadata)) (throw (IllegalStateException. "Can't pop empty deque")))) (lconj [this object] (if content (let [l2 (dec lowest)] (LogNDeque. (assoc content l2 object) l2 highest metadata)) (LogNDeque. (sorted-map 0 object) 0 0 metadata))) java.lang.Object (toString [this] (if content (str (seq this)) "()")) (defn deque [& things] (apply conj (LogNDeque. nil 0 0 {}) things)) Obviously, this has logarithmic performance in general, and for large deques could be up to 32x slower than ideal. Furthermore, there's a problem with it and Clojure 1.3 alpha: if you used it as a queue, so mainly popped at one end and conjed at the other, the numeric indices used internally would crawl towards Long/MAX_VALUE or Long/MIN_VALUE and then it would blow up. On 1.2 and earlier it would simply slow down drastically when it started to use bignum arithmetic. Wondering if there was a way to improve on that I revisited my earlier queue implementation. I found a bug or two and fixed them: (deftype Queue [chunk1 chunk2 size metadata] clojure.lang.IObj (withMeta [this m] (Queue. chunk1 chunk2 size m)) (meta [this] metadata) clojure.lang.IPersistentStack (cons [this object] (Queue. chunk1 (conj chunk2 object) (inc size) metadata)) (count [this] size) (empty [this] (Queue. nil [] 0 metadata)) (equiv [this other] (= (seq this) other)) (seq [this] (seq (lazy-cat chunk1 chunk2))) (peek [this] (if chunk1 (first chunk1) (first chunk2))) (pop [this] (if chunk1 (Queue. (next chunk1) chunk2 (dec size) metadata) (if-let [nc1 (seq chunk2)] (Queue. (next nc1) [] (dec size) metadata) (throw (IllegalStateException. "Can't pop empty queue"))))) java.lang.Object (toString [this] (if (zero? size) "()" (str (seq this))))) (defn queue [& things] (Queue. (seq things) [] (count things) {})) and thought about whether there was a way to make it pop efficiently from the other end after all. I was thinking of things like how java.util.ArrayList will sometimes have to copy and double the size of an internal array, which is O(n), but also because it doubles the size it has to do so at ever diminishing intervals, so in the worst case of constantly adding on the end without ever removing it does one add, then one add and doubles, then two adds and doubles, four adds and doubles, eight adds and doubles, etc.; the time taken in a simplistic model where it takes 1 to add without doubling and 1 per element to double, is 1 for the first add, 2 for the next, 1 for the next two and then 4 for the next, etc.; the elements from 2^n to 2^(n+1) cost 2^n for the individual adds plus another 2^(n+1) for the copying of the whole vector = 3x2^n. So, amortized constant time, with (given the simplistic model) a factor-of-3 overhead. Adding lpop to the queue above has it work efficiently much of the time but copy a vector occasionally. Could something be done to guarantee amortized constant time, by getting rid of that worst-case behavior with alternating left and right pops? It turns out something could: (defprotocol LPop (lpeek [this]) (lpop [this]) (lconj [this object])) (deftype Deque [chunk1 chunk2 size metadata] clojure.lang.IObj (withMeta [this m] (Deque. chunk1 chunk2 size m)) (meta [this] metadata) clojure.lang.IPersistentStack (cons [this object] (Deque. chunk1 (conj chunk2 object) (inc size) metadata)) (count [this] size) (empty [this] (Deque. nil [] 0 metadata)) (equiv [this other] (= (seq this) other)) (seq [this] (seq (lazy-cat chunk1 chunk2))) (peek [this] (peek chunk2)) (pop [this] (if (seq chunk2) (let [nc2 (pop chunk2)] (if (seq nc2) (Deque. chunk1 nc2 (dec size) metadata) (if chunk1 (let [split-idx (quot size 2)] (Deque. (seq (take split-idx chunk1)) (vec (drop split-idx chunk1)) (dec size) metadata)) (Deque. nil [] 0 metadata)))) (throw (IllegalStateException. "Can't pop empty deque")))) LPop (lpeek [this] (if chunk1 (first chunk1) (first chunk2))) (lpop [this] (if chunk1 (Deque. (next chunk1) chunk2 (dec size) metadata) (if-let [nc1 (seq chunk2)] (pop (Deque. (next nc1) ['dummy] (dec size) metadata)) (throw (IllegalStateException. "Can't pop empty queue"))))) (lconj [this object] (Deque. (cons object chunk1) chunk2 (inc size) metadata)) java.lang.Object (toString [this] (if (zero? size) "()" (str (seq this))))) (defn deque [& things] (let [v (vec things)] (Deque. nil v (count v) {}))) Whenever a pop has to vec the left chunk, it does only HALF of it. It still has to walk the whole deque, but it keeps half as a seq as the new left chunk and copies the other half to a vector that becomes the new right chunk. The minimum number of additional pops until this procedure must be repeated is about n/2 since that's roughly the size of the new right chunk. So, the O(n) copy is done roughly every n/2 operations in the worst case, for O(3n)ish performance -- same as for ArrayList expansion. Here's the torture-test: user=> (nth (iterate #(lpop (pop %)) (apply deque (range 50))) 18) #<Deque (18 19 20 21 22 23 24 25 26 27 28 29 30 31)> So far, so good. That performs that worst-case access pattern and it gets the correct result. user=> (time (do (nth (iterate #(lpop (pop %)) (apply deque (range 5000))) 2400) nil)) "Elapsed time: 4.39072 msecs" This is after having run the loop a few times to let JIT do its thing. That's non-horrible for five thousand items. user=> (time (do (nth (iterate #(lpop (pop %)) (apply deque (range 50000))) 24000) nil)) "Elapsed time: 41.92676 msecs" Increasing the size by a factor of ten increases the job by a factor of about ten and a lot less than 100. user=> (time (do (nth (iterate #(lpop (pop %)) (apply deque (range 500000))) 240000) nil)) "Elapsed time: 485.64368 msecs" Ditto. Obviously not quadratic behavior. And it seems to work correctly for the edge case where the whole deque is very small when pop is called. One thing you may notice is that it splits chunk1 anytime chunk2 would become empty and chunk1 is not empty, rather than wait until pop is called with an empty chunk2. Indeed, if lpop exhausts chunk1 it doesn't just make the new chunk1 (next chunk2) and the new chunk2 an empty vector; it sticks a dummy object into the new chunk2 vector and pops it right away to force chunk1 to be split. The reason for this is, actually, peek: (peek [this] (peek chunk2)). If it weren't invariant that if chunk2 was empty the deque was empty this would have to be, instead, (peek [this] (if (seq (chunk2)) (peek chunk2) (last chunk1))) and would be O(n) whenever chunk2 was empty. A worst-case access pattern of (get deque into a state where chunk2 is empty) (peek deque) (peek deque) (peek deque)... would have amortized O(n). Instead, both peek and lpeek are O(1) and both pop and lpop are amortized O(1) -- O(n) if the chunk being popped from empties and O(1) otherwise, with the chunk-splitting ensuring that it averages to O(1) for arbitrary long chains of pops and lpops: user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop %)) (apply deque (range 500000))) 480000) nil)) "Elapsed time: 628.7856 msecs" nil The logarithmic performance of the LogNDeque is noticeably worse: user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop %)) (apply conj (LogNDeque. nil 0 0 {}) (range 500000))) 480000) nil)) "Elapsed time: 4413.452321 msecs" This remains the case even if the time needed to construct the deque is factored out: user=> (def d (apply deque (range 500000))) #'user/d user=> (def lnd (apply conj (LogNDeque. nil 0 0 {}) (range 500000))) #'user/lnd [let numbers settle down] user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop %)) d) 480000) nil)) "Elapsed time: 383.90944 msecs" nil [let numbers settle down] user=> (time (do (nth (iterate #(if (zero? (rand-int 2)) (lpop %) (pop %)) lnd) 480000) nil)) "Elapsed time: 1253.61784 msecs" nil For that size of input the LogNDeque is more than three times slower. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en