PS. Of course the other key detail is that lazy sequences are realized
little by little (1 element at a time or up to 32, depending on
whether they're chunked or not; the original sequence here is chunked,
so the arrays underlying the chunks will be filled 32 elements at a
time, but then the iteration happens a step at a time, with separate
wrappers around the arrays allocated at every step). Thus at any given
point during the execution of (count d), what's stored in memory is
the current value of the counter, a reference to some subsequence of d
and, in the subsequence, a handful of actual elements and a function
which can be called to generate more.


On 21 April 2013 00:51, Michał Marczyk <michal.marc...@gmail.com> wrote:
> On 20 April 2013 23:41, Tonino Jankov <tyaa...@gmail.com> wrote:
>> I mean, I think that in both cases the original sequence at one point in
>> time must be, entirely realized, in memory.
>
> Well no, it doesn't.
>
> The original sequence is lazy and chunked, so it looks like a chain of
> links holding 32 elements each. It so happens that here it is iterated
> over in a chunk-oblivious manner, so it's not terribly inaccurate to
> simply think about it as a singly linked list. Calculating the length
> of such a list involves walking along it while keeping a running
> counter; clearly storing a reference to the head of the list
> throughout the process is not necessary, and indeed Clojure doesn't do
> it.
>
> Thus in the OOME-free case, the reference to the original sequence is
> thrown out almost immediately, followed by a reference to its "rest"
> part, followed by the reference to the "rest" of that etc. The
> throwing out happens inside drop-while at first and then inside the
> clojure.lang.RT.countFrom method.
>
> The key detail here is the way in which all references to d held by
> methods in the chain leading up to the call to countFrom are cleared
> before control is handed of to countFrom. The trick involved is known
> as "locals clearing"; I've hinted at how it works in the SO answer,
> see also the methods relevant here -- clojure.lang.RT.count,
> clojure.lang.Util.ret1, clojure.lang.RT.countFrom.
>
> A further clarification: t and d refer to two different lazy
> sequences, which are constructed by applying different transformations
> to a third sequence, which we have been referring to as "the original
> sequence". This is the huge sequence which doesn't fit in available
> memory. As it happens, while d is not the same object as the original
> sequence, it is a subsequence of the original sequence (from where the
> split-with predicate fails to the end), so it does share structure
> with it, so there is no "doubling".
>
> So, as mentioned previously, the key difference between the working
> and the non-working version is in when the reference to the original
> sequence hiding inside t gets cleared, as (count d) by itself doesn't
> require a live reference to either the original sequence or even d
> itself.
>
> Cheers,
> Michał
>
>>
>> And if there is no doubling of it in critical case, what is critical?
>>
>> If in (count t) (count d) - non.problematic- case original sequence also, at
>> one point, is, actually, in its entirety present in memory, it means that
>> memory can handle the whole collection.
>>
>> Maybe my questions sound a bit dubious, but anyway, I'm a bit sold out on
>> this lisp, so I want to get it right.
>>
>>
>> On 20 April 2013 23:33, Tonino Jankov <tyaa...@gmail.com> wrote:
>>>
>>> Marko, you say "There is no doubling: t and d share the same underlying
>>> lazy sequence and will refer to the same objects. The trouble is only that
>>> you force the evaluation of (count d) while (count t) still waits to be
>>> evaluated, so t must definitely stay bound to the head of the shared
>>> sequence.".
>>>
>>> But if there is no doubling, and single lazy sequence is in the memory in
>>> both cases, how does then memory have problem with one case and not with the
>>> other?
>>> If both t and d refer to the same (realized) object in memory.
>>>
>>> In both cases, to spit out t or d, the program must have it at one point
>>> in its memory.
>>>
>>> So what spends the EXTRA, critical, OOME memory in (count d) (count t)
>>> case?
>>>
>>> Or does it get instantly garbaged the moment it gets realized in (count t)
>>> (count d) case?
>>>
>>> Anyway, thanks for the exhaustive discussion, Marko & Michal
>>>
>>>
>>>
>>> On 18 April 2013 00:01, Michał Marczyk <michal.marc...@gmail.com> wrote:
>>>>
>>>> Note that the problem is not that t needs to hang around; it's that t
>>>> holds a lazy sequence which hangs around in unrealized state. That
>>>> lazy sequence internally holds a thunk -- a nullary function --
>>>> capable of producing the actual sequence elements on request. It is
>>>> this thunk that holds a reference to the underlying huge sequence.
>>>> Once t is realized, the actual sequence gets cached and the thunk
>>>> becomes eligible for GC (the field holding it is set to null). If it
>>>> then needs to stay around for some other purpose, that is no problem:
>>>>
>>>> user=> (let [[t d] (split-with #(< % 12) (range 1e8))] [(count t)
>>>> (count d) (count t)])
>>>> [12 99999988 12]
>>>>
>>>> (Or I suppose you could return [(count d) (count t)], but (dorun t)
>>>> before that.)
>>>>
>>>> Also, just to be explicit about this, calling (let [x
>>>> (produce-huge-seq)] (count x)) is not a problem, because x gets
>>>> cleared prior to control being handed off to count.
>>>>
>>>> I've also discussed the details of what's going on on SO, which is
>>>> where I first noticed this question:
>>>>
>>>> http://stackoverflow.com/questions/15994316/clojure-head-retention
>>>>
>>>> Cheers,
>>>> Michał
>>>>
>>>>
>>>> On 17 April 2013 22:53, Marko Topolnik <marko.topol...@gmail.com> wrote:
>>>> > On Monday, April 15, 2013 1:50:37 AM UTC+2, tyaakow wrote:
>>>> >>
>>>> >> Thank you for your response, Marko.
>>>> >> I want to clarify one more thing:
>>>> >>
>>>> >> (let [[t d] (split-with #(< % 12) (range 1e8))]
>>>> >>     [(count d) (count t)])
>>>> >>
>>>> >>
>>>> >> does this mean that while (count d) is realizing (range 1e8) seq, it
>>>> >> becomes (also) realized within t, therefore
>>>> >> it doubles (range 1e8) in memory causing OOME while (count d) is still
>>>> >> not
>>>> >> finished?
>>>> >
>>>> >
>>>> > There is no doubling: t and d share the same underlying lazy sequence
>>>> > and
>>>> > will refer to the same objects. The trouble is only that you force the
>>>> > evaluation of (count d) while (count t) still waits to be evaluated, so
>>>> > t
>>>> > must definitely stay bound to the head of the shared sequence.
>>>> >
>>>> >>
>>>> >> Also, you say "As count realizes one element after another, it doesn't
>>>> >> on
>>>> >> its own retain a reference to the past elements."
>>>> >>
>>>> >> Does this mean that, eg. in repl, when I do some (count xyz) and it
>>>> >> realizes xyz, It will later need to be reevaluated (realized again) if
>>>> >> I
>>>> >> require xyz within repl (I presume that if I require xyz later within
>>>> >> file,
>>>> >> it wont be GC due to it and clojure will know it shouldnt be GC)
>>>> >
>>>> >
>>>> > Be careful to observe that I say "doesn't on its own retain a reference
>>>> > to
>>>> > the past elements". If you have xyz bound to the head of your sequence,
>>>> > it
>>>> > will force the entire sequence to stay in memory for as long as xyz is
>>>> > within scope (if it's a local) or indefinitely (if it's a global def'd
>>>> > var).
>>>> > Generally, a lazy sequence never gets un-realized once it got
>>>> > realized---the
>>>> > only option is for it to disappear entirely (turn into garbage).
>>>> >
>>>> > -marko
>>>> >
>>>> > --
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups "Clojure" group.
>>>> > To post to this group, send email to clojure@googlegroups.com
>>>> > Note that posts from new members are moderated - please be patient with
>>>> > your
>>>> > first post.
>>>> > To unsubscribe from this group, send email to
>>>> > clojure+unsubscr...@googlegroups.com
>>>> > For more options, visit this group at
>>>> > http://groups.google.com/group/clojure?hl=en
>>>> > ---
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups
>>>> > "Clojure" group.
>>>> > To unsubscribe from this group and stop receiving emails from it, send
>>>> > an
>>>> > email to clojure+unsubscr...@googlegroups.com.
>>>> > For more options, visit https://groups.google.com/groups/opt_out.
>>>> >
>>>> >
>>>>
>>>> --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clojure@googlegroups.com
>>>> Note that posts from new members are moderated - please be patient with
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+unsubscr...@googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> ---
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to clojure+unsubscr...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to