Re: Running out of memory when using loop/recur and destructuring

Brian Hurt Tue, 03 Nov 2009 18:13:18 -0800

On Tue, Nov 3, 2009 at 5:19 PM, Paul Mooser <taron...@gmail.com> wrote:


>
> I understand the pragmatism of your approach, but it's really
> unfortunate. Seqs are a really convenient abstraction, and the ability
> to model arbitrarily large or infinite ones (with laziness) is really
> useful. In my opinion, only using seqs when all of the data can be fit
> into memory really undermines the value of the abstraction (by
> narrowing its usages so severely), and also makes laziness far less
> useful (except possibly as a way to amortize costs over time, rather
> than as a way to model infinite things).
>
>
I agree.  I don't like having to ditch seqs.  And producers bring their own
downsides- for example, being imperative constructs, they open the door for
race conditions on multi-threaded code in a way that seqs don't.  If
anything, producers have a more-limited range of applicability than seqs, or
even iterators, do.  Also, polluting the meme-space with three constructs
which are very similiar, but subtly different, is also a problem I'm not
happy with.

But here's an example of the sorts of problems we were hitting.  OK, we all
know that doseq doesn't hold on to the head of the seq.  But what if I
write:
(defn print-seq [ s ]
    (doseq [ x s ]
        (println x)))
Does this code hold on to the head of the seq (in the argument to the
function)?  I'm honestly not sure- and strongly suspect that the answer
depends upon (among other things) which JVM you run the code on (and which
optimizations it will perform), and how long the code has been running (and
thus what optimizations have been performed on the code).

And even if it doesn't, then I have no doubt that with a little
complication, I can develop code that does (or at least might) hold on the
head of the seq unnecessarily.  Which means this is not only an issue for
the original writer of the code, but also the maintainer.



> This path has been well-tread, but the danger of hanging on to the
> head of the list is due to the caching behavior of lazy seqs, which is
> important for consistency - otherwise, walking the same seq twice
> might result in different results.
>
> As with most engineering efforts, there are trade-offs, but I've been
> willing to accept the extra caution I need to employ when dealing with
> lazy seqs. I've run into a few of these kinds of bugs over time, and
> I'm guessing it's generally because in my uses, I'm dealing with
> millions of records, and far more data than I can fit in memory. I'm
> not sure that this indicates that seqs are the wrong tool in this
> instance (as you seem to say), but the answer isn't clear to me.
>

It's not clear.  If you know that millions of records are as large as you're
going to see, then seqs are the right tool- and if you load everything into
memory, oh well.  If the number of records might creep into the billions or
trillions, then seqs (with their risk of wanting to keep everything in
memory) are a bad choice IMHO.

Brian

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Running out of memory when using loop/recur and destructuring

Reply via email to