Thanks! I was playing around with similar variants last night & came up with 
some that seem to work for one element, but not many.

I was seeing a similar result from your version:

=> (map2 count [(repeat 1e8 "stuff") (repeat 1e8 "stuff") (repeat 1e8 
"stuff")]) 
OutOfMemoryError GC overhead limit exceeded
java.lang.Double.valueOf (Double.java:521)
clojure.lang.Numbers$DoubleOps.dec (Numbers.java:628)
clojure.lang.Numbers.dec (Numbers.java:118)
clojure.core/take/fn--4270 (core.clj:2627)
clojure.lang.LazySeq.sval (LazySeq.java:40)
clojure.lang.LazySeq.seq (LazySeq.java:49)
clojure.lang.Cons.next (Cons.java:39)
clojure.lang.RT.countFrom (RT.java:540)
clojure.lang.RT.count (RT.java:530)
clojure.core/count (core.clj:839)
pigpen.runtime/map2/fn--1344 (NO_SOURCE_FILE:5)
clojure.lang.LazySeq.sval (LazySeq.java:40)

But then it dawned on us to try a list instead of a vector for the main seq:

=> (map2 count (list (repeat 1e8 "stuff") (repeat 1e8 "stuff") (repeat 1e8 
"stuff"))) 
(100000000 100000000 100000000)


It does end up thrashing with GCs a bit, but in the end it does the job. It 
seems like the vector was holding on to something that the list doesn't.

It's unfortunate that I would need a custom map implementation to make this 
work. Elsewhere in the code I (somewhat accidentally) ended up using async 
channels to solve the same problem. Instead of modeling the large collection as 
a lazy seq, I have a producer put items into a collection and a consumer read 
from the collection and transform them (count in this example).

In general, would it be better to use channels instead of lazy seqs for very 
large sequences? Lazy seqs seem to have a couple of disadvantages at scale: you 
have to be really careful not to hold on to the head (frequently this is hidden 
anyway), and the evaluation of the lazy seq seems to stress out the GC.

Are there other alternatives for large seqs that are better than either of 
these options?

Thanks,
Matt




On Monday, November 10, 2014 at 10:47 PM, Andy Fingerhut wrote:

> At least in your particular case, replacing map with map2, defined below as a 
> small modification to a subset of map, seems to do the trick:
> 
> (defn map2 [f coll]
>   (lazy-seq
>    (when-let [s (seq coll)]
>      (let [r (rest s)]
>        (cons (f (first s)) (map2 f r))))))
> 
> 
> (map2 count [(repeat 1e8 "stuff")])
> 
> I believe this is because the original definition of map, or the subset of it 
> below:
> 
> (defn map [f coll]
>   (lazy-seq
>    (when-let [s (seq coll)]
>      (cons (f (first s)) (map f (rest s))))))
> 
> 
> holds onto the head via needing to keep the value of "s" around throughout 
> the entire call to (f (first s)) in order to later make the call (map f (rest 
> s)).  In map2, the value of s is no longer needed by the time f is called.
> 
> Andy
> 
> On Mon, Nov 10, 2014 at 7:48 PM, 'Matt Bossenbroek' via Clojure 
> <clojure@googlegroups.com (mailto:clojure@googlegroups.com)> wrote:
> > Ran into an interesting problem today. In short, this works: 
> > 
> > (count (repeat 1e8 "stuff")) 
> > 
> > But this doesn't:
> > 
> > (map count [(repeat 1e8 "stuff")])
> > 
> > To be fair, given sufficient memory, it would eventually complete. (If the 
> > second example does work for you, change it to 1e10 or something higher).
> > 
> > The first one works because nothing is holding on to the head of the seq. 
> > My assumption is that the second is eating memory because map still has a 
> > reference to the item being processed, while the call to count is causing 
> > it to be evaluated. Thus the whole seq is retained and we run out of memory.
> > 
> > Is my guess correct? If so, is there a workaround for this?
> > 
> > Thanks,
> > Matt
> > 
> > -- 
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to clojure@googlegroups.com 
> > (mailto:clojure@googlegroups.com)
> > Note that posts from new members are moderated - please be patient with 
> > your first post.
> > To unsubscribe from this group, send email to
> > clojure+unsubscr...@googlegroups.com 
> > (mailto:clojure%2bunsubscr...@googlegroups.com)
> > For more options, visit this group at
> > http://groups.google.com/group/clojure?hl=en
> > --- 
> > You received this message because you are subscribed to the Google Groups 
> > "Clojure" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to clojure+unsubscr...@googlegroups.com 
> > (mailto:clojure+unsubscr...@googlegroups.com).
> > For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com 
> (mailto:clojure@googlegroups.com)
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com 
> (mailto:clojure+unsubscr...@googlegroups.com)
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com 
> (mailto:clojure+unsubscr...@googlegroups.com).
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to