On 20 April 2013 23:41, Tonino Jankov <tyaa...@gmail.com> wrote: > I mean, I think that in both cases the original sequence at one point in > time must be, entirely realized, in memory.
Well no, it doesn't. The original sequence is lazy and chunked, so it looks like a chain of links holding 32 elements each. It so happens that here it is iterated over in a chunk-oblivious manner, so it's not terribly inaccurate to simply think about it as a singly linked list. Calculating the length of such a list involves walking along it while keeping a running counter; clearly storing a reference to the head of the list throughout the process is not necessary, and indeed Clojure doesn't do it. Thus in the OOME-free case, the reference to the original sequence is thrown out almost immediately, followed by a reference to its "rest" part, followed by the reference to the "rest" of that etc. The throwing out happens inside drop-while at first and then inside the clojure.lang.RT.countFrom method. The key detail here is the way in which all references to d held by methods in the chain leading up to the call to countFrom are cleared before control is handed of to countFrom. The trick involved is known as "locals clearing"; I've hinted at how it works in the SO answer, see also the methods relevant here -- clojure.lang.RT.count, clojure.lang.Util.ret1, clojure.lang.RT.countFrom. A further clarification: t and d refer to two different lazy sequences, which are constructed by applying different transformations to a third sequence, which we have been referring to as "the original sequence". This is the huge sequence which doesn't fit in available memory. As it happens, while d is not the same object as the original sequence, it is a subsequence of the original sequence (from where the split-with predicate fails to the end), so it does share structure with it, so there is no "doubling". So, as mentioned previously, the key difference between the working and the non-working version is in when the reference to the original sequence hiding inside t gets cleared, as (count d) by itself doesn't require a live reference to either the original sequence or even d itself. Cheers, Michał > > And if there is no doubling of it in critical case, what is critical? > > If in (count t) (count d) - non.problematic- case original sequence also, at > one point, is, actually, in its entirety present in memory, it means that > memory can handle the whole collection. > > Maybe my questions sound a bit dubious, but anyway, I'm a bit sold out on > this lisp, so I want to get it right. > > > On 20 April 2013 23:33, Tonino Jankov <tyaa...@gmail.com> wrote: >> >> Marko, you say "There is no doubling: t and d share the same underlying >> lazy sequence and will refer to the same objects. The trouble is only that >> you force the evaluation of (count d) while (count t) still waits to be >> evaluated, so t must definitely stay bound to the head of the shared >> sequence.". >> >> But if there is no doubling, and single lazy sequence is in the memory in >> both cases, how does then memory have problem with one case and not with the >> other? >> If both t and d refer to the same (realized) object in memory. >> >> In both cases, to spit out t or d, the program must have it at one point >> in its memory. >> >> So what spends the EXTRA, critical, OOME memory in (count d) (count t) >> case? >> >> Or does it get instantly garbaged the moment it gets realized in (count t) >> (count d) case? >> >> Anyway, thanks for the exhaustive discussion, Marko & Michal >> >> >> >> On 18 April 2013 00:01, Michał Marczyk <michal.marc...@gmail.com> wrote: >>> >>> Note that the problem is not that t needs to hang around; it's that t >>> holds a lazy sequence which hangs around in unrealized state. That >>> lazy sequence internally holds a thunk -- a nullary function -- >>> capable of producing the actual sequence elements on request. It is >>> this thunk that holds a reference to the underlying huge sequence. >>> Once t is realized, the actual sequence gets cached and the thunk >>> becomes eligible for GC (the field holding it is set to null). If it >>> then needs to stay around for some other purpose, that is no problem: >>> >>> user=> (let [[t d] (split-with #(< % 12) (range 1e8))] [(count t) >>> (count d) (count t)]) >>> [12 99999988 12] >>> >>> (Or I suppose you could return [(count d) (count t)], but (dorun t) >>> before that.) >>> >>> Also, just to be explicit about this, calling (let [x >>> (produce-huge-seq)] (count x)) is not a problem, because x gets >>> cleared prior to control being handed off to count. >>> >>> I've also discussed the details of what's going on on SO, which is >>> where I first noticed this question: >>> >>> http://stackoverflow.com/questions/15994316/clojure-head-retention >>> >>> Cheers, >>> Michał >>> >>> >>> On 17 April 2013 22:53, Marko Topolnik <marko.topol...@gmail.com> wrote: >>> > On Monday, April 15, 2013 1:50:37 AM UTC+2, tyaakow wrote: >>> >> >>> >> Thank you for your response, Marko. >>> >> I want to clarify one more thing: >>> >> >>> >> (let [[t d] (split-with #(< % 12) (range 1e8))] >>> >> [(count d) (count t)]) >>> >> >>> >> >>> >> does this mean that while (count d) is realizing (range 1e8) seq, it >>> >> becomes (also) realized within t, therefore >>> >> it doubles (range 1e8) in memory causing OOME while (count d) is still >>> >> not >>> >> finished? >>> > >>> > >>> > There is no doubling: t and d share the same underlying lazy sequence >>> > and >>> > will refer to the same objects. The trouble is only that you force the >>> > evaluation of (count d) while (count t) still waits to be evaluated, so >>> > t >>> > must definitely stay bound to the head of the shared sequence. >>> > >>> >> >>> >> Also, you say "As count realizes one element after another, it doesn't >>> >> on >>> >> its own retain a reference to the past elements." >>> >> >>> >> Does this mean that, eg. in repl, when I do some (count xyz) and it >>> >> realizes xyz, It will later need to be reevaluated (realized again) if >>> >> I >>> >> require xyz within repl (I presume that if I require xyz later within >>> >> file, >>> >> it wont be GC due to it and clojure will know it shouldnt be GC) >>> > >>> > >>> > Be careful to observe that I say "doesn't on its own retain a reference >>> > to >>> > the past elements". If you have xyz bound to the head of your sequence, >>> > it >>> > will force the entire sequence to stay in memory for as long as xyz is >>> > within scope (if it's a local) or indefinitely (if it's a global def'd >>> > var). >>> > Generally, a lazy sequence never gets un-realized once it got >>> > realized---the >>> > only option is for it to disappear entirely (turn into garbage). >>> > >>> > -marko >>> > >>> > -- >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups "Clojure" group. >>> > To post to this group, send email to clojure@googlegroups.com >>> > Note that posts from new members are moderated - please be patient with >>> > your >>> > first post. >>> > To unsubscribe from this group, send email to >>> > clojure+unsubscr...@googlegroups.com >>> > For more options, visit this group at >>> > http://groups.google.com/group/clojure?hl=en >>> > --- >>> > You received this message because you are subscribed to the Google >>> > Groups >>> > "Clojure" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> > an >>> > email to clojure+unsubscr...@googlegroups.com. >>> > For more options, visit https://groups.google.com/groups/opt_out. >>> > >>> > >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clojure@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+unsubscr...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to clojure+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.