On Tue, Apr 5, 2011 at 2:27 AM, Meikel Brandmeyer <m...@kotka.de> wrote: > Hi, > > On 5 Apr., 01:27, Ken Wesson <kwess...@gmail.com> wrote: > >> Oh, and it also doesn't hold onto the head: > > Your version of partition-by is basically equivalent to: > > (defn partition-by > [f coll] > (lazy-seq > (when-let [s (seq coll)] > (let [fst (first s) > value (f fst) > [group tail] (split-with #(-> % f (= value)) (rest s))] > (cons (cons fst group) (partition-by f tail))))))
I don't think so. That one applies f to some elements twice: Expensive computation would have been run on: 0 Expensive computation would have been run on: 1 Expensive computation would have been run on: 2 Expensive computation would have been run on: 3 Expensive computation would have been run on: 3 Expensive computation would have been run on: 4 Expensive computation would have been run on: 5 Expensive computation would have been run on: 6 Expensive computation would have been run on: 6 Expensive computation would have been run on: 7 Expensive computation would have been run on: 8 Expensive computation would have been run on: 9 Expensive computation would have been run on: 9 Expensive computation would have been run on: 10 Expensive computation would have been run on: 11 Expensive computation would have been run on: 12 Expensive computation would have been run on: 12 Expensive computation would have been run on: 13 > It traverses the input sequence twice: once for the take-while, once > for the drop-while. Mine only does an = comparison on the second traversal. If f is expensive, the extra = comparison gets dominated. If the sequence is short, the extra traversal doesn't matter much. Only when f is very cheap and the sequence quite long is it potentially an issue. > And it does "kind of" hold onto the head. It doesn't really hold the head, > but it still holds references to the items of the first group, because the > (drop-while first ..) is not executed immediately. Neither version has that problem: user=> (defn heavy-structure [n] [n (byte-array 100000000)]) user=> (defn memrem [] (doall (map heavy-structure (map print (iterate inc 0))))) #'user/memrem user=> (memrem) 012345678910#<CompilerException java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:84)> ; was about 1 GB of heap available user=> (def q (second (doall (my-partition-by alength (map byte-array [100000000 200000000]))))) #'user/q user=> (memrem) 012345678#<CompilerException java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:90)> ; 800 MB free user=> (def q (second (doall (meikel-partition-by alength (map byte-array [100000000 200000000]))))) #'user/q user=> (memrem) 012345678#<CompilerException java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:92)> ; 800 MB free Both of them are holding onto the second, 200MB byte array but not the first (or memrem would stop at 7). The second array was realized to compare its length with the first when finding the new group, and is still referenced as it is a member of the second group, but the first group and first array are not held onto. user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map heavy-structure (iterate inc 0)))) 2)) #'user/q user=> (memrem) 0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:99)> user=> (def q (nth (first (meikel-partition-by #(quot (first %) 3) (map heavy-structure (iterate inc 0)))) 2)) #'user/q user=> (memrem) 0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:101)> This time we generate groups of three 100MB structures and save the third item of the first group. The memory use this causes is 100MB, as expected if only that third item is being held onto, and not the head of the first group. If the head of the first group were being held while we're deeper into the first group but haven't realized the start of the second, we'd have 300MB of memory use instead of 100 and memrem would have crapped out at 7. And this: user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map heavy-structure (map #(doto % println) (iterate inc 0))))) 2)) 0 1 2 #'user/q user=> (def q (nth (first (x-partition-by #(quot (first %) 3) (map heavy-structure (map #(doto % println) (iterate inc 0))))) 2)) 0 1 2 #'user/q proves that it's only realizing up to the third item in this scenario; it really is not realizing the fourth, and the start of the second group. Yet it's already let go of the head of the first group. If I understand your claim correctly, it's therefore wrong. Certainly these tests seem to show it not holding onto extravagantly-large objects any longer than necessary. The only awkwardness is that it does eagerly realize the first element of each group when that group itself is realized as an element of the outer seq, and I don't really see any easy way to avoid that; and on that score, both of our partition-bys are vastly superior to the clojure.core 1.2 version: user=> (def q (second (partition-by #(quot % 3) (report-seq)))) ">0<" ">1<" ">2<" ">3<" ">4<" ">5<" ">6<" #'user/q As you can see, that one realizes the entire second group as soon as the second group itself is realized as an element of the outer seq. The inner seqs are completely eager. Which I believe is the issue that motivated this in the first place. :) So, yours appears to beat clojure.core 1.2's by a large margin, and mine yours by a smaller one (as it holds onto and realizes nothing longer or sooner than yours does, and avoids one extra application of f per group). :) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en