Thanks a lot Michal! 在 2013-4-21,上午6:51,Michał Marczyk <michal.marc...@gmail.com> 写道:
> On 20 April 2013 23:41, Tonino Jankov <tyaa...@gmail.com> wrote: >> I mean, I think that in both cases the original sequence at one point in >> time must be, entirely realized, in memory. > > Well no, it doesn't. > > The original sequence is lazy and chunked, so it looks like a chain of > links holding 32 elements each. It so happens that here it is iterated > over in a chunk-oblivious manner, so it's not terribly inaccurate to > simply think about it as a singly linked list. Calculating the length > of such a list involves walking along it while keeping a running > counter; clearly storing a reference to the head of the list > throughout the process is not necessary, and indeed Clojure doesn't do > it. > > Thus in the OOME-free case, the reference to the original sequence is > thrown out almost immediately, followed by a reference to its "rest" > part, followed by the reference to the "rest" of that etc. The > throwing out happens inside drop-while at first and then inside the > clojure.lang.RT.countFrom method. > > The key detail here is the way in which all references to d held by > methods in the chain leading up to the call to countFrom are cleared > before control is handed of to countFrom. The trick involved is known > as "locals clearing"; I've hinted at how it works in the SO answer, > see also the methods relevant here -- clojure.lang.RT.count, > clojure.lang.Util.ret1, clojure.lang.RT.countFrom. > > A further clarification: t and d refer to two different lazy > sequences, which are constructed by applying different transformations > to a third sequence, which we have been referring to as "the original > sequence". This is the huge sequence which doesn't fit in available > memory. As it happens, while d is not the same object as the original > sequence, it is a subsequence of the original sequence (from where the > split-with predicate fails to the end), so it does share structure > with it, so there is no "doubling". > > So, as mentioned previously, the key difference between the working > and the non-working version is in when the reference to the original > sequence hiding inside t gets cleared, as (count d) by itself doesn't > require a live reference to either the original sequence or even d > itself. > > Cheers, > Michał > >> >> And if there is no doubling of it in critical case, what is critical? >> >> If in (count t) (count d) - non.problematic- case original sequence also, at >> one point, is, actually, in its entirety present in memory, it means that >> memory can handle the whole collection. >> >> Maybe my questions sound a bit dubious, but anyway, I'm a bit sold out on >> this lisp, so I want to get it right. >> >> >> On 20 April 2013 23:33, Tonino Jankov <tyaa...@gmail.com> wrote: >>> >>> Marko, you say "There is no doubling: t and d share the same underlying >>> lazy sequence and will refer to the same objects. The trouble is only that >>> you force the evaluation of (count d) while (count t) still waits to be >>> evaluated, so t must definitely stay bound to the head of the shared >>> sequence.". >>> >>> But if there is no doubling, and single lazy sequence is in the memory in >>> both cases, how does then memory have problem with one case and not with the >>> other? >>> If both t and d refer to the same (realized) object in memory. >>> >>> In both cases, to spit out t or d, the program must have it at one point >>> in its memory. >>> >>> So what spends the EXTRA, critical, OOME memory in (count d) (count t) >>> case? >>> >>> Or does it get instantly garbaged the moment it gets realized in (count t) >>> (count d) case? >>> >>> Anyway, thanks for the exhaustive discussion, Marko & Michal >>> >>> >>> >>> On 18 April 2013 00:01, Michał Marczyk <michal.marc...@gmail.com> wrote: >>>> >>>> Note that the problem is not that t needs to hang around; it's that t >>>> holds a lazy sequence which hangs around in unrealized state. That >>>> lazy sequence internally holds a thunk -- a nullary function -- >>>> capable of producing the actual sequence elements on request. It is >>>> this thunk that holds a reference to the underlying huge sequence. >>>> Once t is realized, the actual sequence gets cached and the thunk >>>> becomes eligible for GC (the field holding it is set to null). If it >>>> then needs to stay around for some other purpose, that is no problem: >>>> >>>> user=> (let [[t d] (split-with #(< % 12) (range 1e8))] [(count t) >>>> (count d) (count t)]) >>>> [12 99999988 12] >>>> >>>> (Or I suppose you could return [(count d) (count t)], but (dorun t) >>>> before that.) >>>> >>>> Also, just to be explicit about this, calling (let [x >>>> (produce-huge-seq)] (count x)) is not a problem, because x gets >>>> cleared prior to control being handed off to count. >>>> >>>> I've also discussed the details of what's going on on SO, which is >>>> where I first noticed this question: >>>> >>>> http://stackoverflow.com/questions/15994316/clojure-head-retention >>>> >>>> Cheers, >>>> Michał >>>> >>>> >>>> On 17 April 2013 22:53, Marko Topolnik <marko.topol...@gmail.com> wrote: >>>>> On Monday, April 15, 2013 1:50:37 AM UTC+2, tyaakow wrote: >>>>>> >>>>>> Thank you for your response, Marko. >>>>>> I want to clarify one more thing: >>>>>> >>>>>> (let [[t d] (split-with #(< % 12) (range 1e8))] >>>>>> [(count d) (count t)]) >>>>>> >>>>>> >>>>>> does this mean that while (count d) is realizing (range 1e8) seq, it >>>>>> becomes (also) realized within t, therefore >>>>>> it doubles (range 1e8) in memory causing OOME while (count d) is still >>>>>> not >>>>>> finished? >>>>> >>>>> >>>>> There is no doubling: t and d share the same underlying lazy sequence >>>>> and >>>>> will refer to the same objects. The trouble is only that you force the >>>>> evaluation of (count d) while (count t) still waits to be evaluated, so >>>>> t >>>>> must definitely stay bound to the head of the shared sequence. >>>>> >>>>>> >>>>>> Also, you say "As count realizes one element after another, it doesn't >>>>>> on >>>>>> its own retain a reference to the past elements." >>>>>> >>>>>> Does this mean that, eg. in repl, when I do some (count xyz) and it >>>>>> realizes xyz, It will later need to be reevaluated (realized again) if >>>>>> I >>>>>> require xyz within repl (I presume that if I require xyz later within >>>>>> file, >>>>>> it wont be GC due to it and clojure will know it shouldnt be GC) >>>>> >>>>> >>>>> Be careful to observe that I say "doesn't on its own retain a reference >>>>> to >>>>> the past elements". If you have xyz bound to the head of your sequence, >>>>> it >>>>> will force the entire sequence to stay in memory for as long as xyz is >>>>> within scope (if it's a local) or indefinitely (if it's a global def'd >>>>> var). >>>>> Generally, a lazy sequence never gets un-realized once it got >>>>> realized---the >>>>> only option is for it to disappear entirely (turn into garbage). >>>>> >>>>> -marko >>>>> >>>>> -- >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To post to this group, send email to clojure@googlegroups.com >>>>> Note that posts from new members are moderated - please be patient with >>>>> your >>>>> first post. >>>>> To unsubscribe from this group, send email to >>>>> clojure+unsubscr...@googlegroups.com >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/clojure?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups >>>>> "Clojure" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an >>>>> email to clojure+unsubscr...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clojure@googlegroups.com >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+unsubscr...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/group/clojure?hl=en >>>> --- >>>> You received this message because you are subscribed to the Google Groups >>>> "Clojure" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to clojure+unsubscr...@googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>>> >>> >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with your >> first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.