Re: (take-by f coll), (drop-by f coll): Problem Comparing Adjacent Items Of A Collection

Ken Wesson Tue, 05 Apr 2011 00:44:49 -0700

On Tue, Apr 5, 2011 at 2:27 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
> Hi,
>
> On 5 Apr., 01:27, Ken Wesson <kwess...@gmail.com> wrote:
>
>> Oh, and it also doesn't hold onto the head:
>
> Your version of partition-by is basically equivalent to:
>
> (defn partition-by
>  [f coll]
>  (lazy-seq
>    (when-let [s (seq coll)]
>      (let [fst          (first s)
>            value        (f fst)
>            [group tail] (split-with #(-> % f (= value)) (rest s))]
>        (cons (cons fst group) (partition-by f tail))))))


I don't think so. That one applies f to some elements twice:

Expensive computation would have been run on:  0
Expensive computation would have been run on:  1
Expensive computation would have been run on:  2
Expensive computation would have been run on:  3
Expensive computation would have been run on:  3
Expensive computation would have been run on:  4
Expensive computation would have been run on:  5
Expensive computation would have been run on:  6
Expensive computation would have been run on:  6
Expensive computation would have been run on:  7
Expensive computation would have been run on:  8
Expensive computation would have been run on:  9
Expensive computation would have been run on:  9
Expensive computation would have been run on:  10
Expensive computation would have been run on:  11
Expensive computation would have been run on:  12
Expensive computation would have been run on:  12
Expensive computation would have been run on:  13

> It traverses the input sequence twice: once for the take-while, once
> for the drop-while.

Mine only does an = comparison on the second traversal. If f is
expensive, the extra = comparison gets dominated. If the sequence is
short, the extra traversal doesn't matter much. Only when f is very
cheap and the sequence quite long is it potentially an issue.

> And it does "kind of" hold onto the head. It doesn't really hold the head,
> but it still holds references to the items of the first group, because the
> (drop-while first ..) is not executed immediately.

Neither version has that problem:

user=> (defn heavy-structure [n]
         [n (byte-array 100000000)])
user=> (defn memrem [] (doall (map heavy-structure (map print (iterate
inc 0)))))
#'user/memrem
user=> (memrem)
012345678910#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:84)>
; was about 1 GB of heap available
user=> (def q (second (doall (my-partition-by alength (map byte-array
[100000000 200000000])))))
#'user/q
user=> (memrem)
012345678#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:90)>
; 800 MB free
user=> (def q (second (doall (meikel-partition-by alength (map
byte-array [100000000 200000000])))))
#'user/q
user=> (memrem)
012345678#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:92)>
; 800 MB free

Both of them are holding onto the second, 200MB byte array but not the
first (or memrem would stop at 7). The second array was realized to
compare its length with the first when finding the new group, and is
still referenced as it is a member of the second group, but the first
group and first array are not held onto.

user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map
heavy-structure (iterate inc 0)))) 2))
#'user/q
user=> (memrem)
0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:99)>
user=> (def q (nth (first (meikel-partition-by #(quot (first %) 3)
(map heavy-structure (iterate inc 0)))) 2))
#'user/q
user=> (memrem)
0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:101)>

This time we generate groups of three 100MB structures and save the
third item of the first group. The memory use this causes is 100MB, as
expected if only that third item is being held onto, and not the head
of the first group. If the head of the first group were being held
while we're deeper into the first group but haven't realized the start
of the second, we'd have 300MB of memory use instead of 100 and memrem
would have crapped out at 7.

And this:

user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map
heavy-structure (map #(doto % println) (iterate inc 0))))) 2))
0
1
2
#'user/q
user=> (def q (nth (first (x-partition-by #(quot (first %) 3) (map
heavy-structure (map #(doto % println) (iterate inc 0))))) 2))
0
1
2
#'user/q

proves that it's only realizing up to the third item in this scenario;
it really is not realizing the fourth, and the start of the second
group. Yet it's already let go of the head of the first group. If I
understand your claim correctly, it's therefore wrong. Certainly these
tests seem to show it not holding onto extravagantly-large objects any
longer than necessary.

The only awkwardness is that it does eagerly realize the first element
of each group when that group itself is realized as an element of the
outer seq, and I don't really see any easy way to avoid that; and on
that score, both of our partition-bys are vastly superior to the
clojure.core 1.2 version:

user=> (def q (second (partition-by #(quot % 3) (report-seq))))
">0<"
">1<"
">2<"
">3<"
">4<"
">5<"
">6<"
#'user/q

As you can see, that one realizes the entire second group as soon as
the second group itself is realized as an element of the outer seq.
The inner seqs are completely eager. Which I believe is the issue that
motivated this in the first place. :)

So, yours appears to beat clojure.core 1.2's by a large margin, and
mine yours by a smaller one (as it holds onto and realizes nothing
longer or sooner than yours does, and avoids one extra application of
f per group). :)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: (take-by f coll), (drop-by f coll): Problem Comparing Adjacent Items Of A Collection

Reply via email to