To clarify, there were actually two issues: one thing that may not be clear is 
that
> elapsed time: 0.23610454 seconds (16000096 bytes allocated)

tells you how many bytes were allocated, but it omits mentioning that most/all 
of those were (or will be) freed. In other words, this was _not_ symptomatic 
of a leak---it was just a message whose meaning could have been clearer. That 
behavior just changed in https://github.com/JuliaLang/julia/pull/11186. 
Hopefully it will be clearer going forward.

The issue that Yichao mentioned was more subtle and a much bigger problem, but 
I don't think this is what you were noticing, Mohammed. That more serious 
issue seems to be fixed by https://github.com/JuliaLang/julia/pull/11314

--Tim

On Sunday, May 17, 2015 05:53:14 PM Yichao Yu wrote:
> On Sun, May 17, 2015 at 5:05 PM, Mohammed El-Beltagy
> 
> <[email protected]> wrote:
> > Many thanks Milan and Yichao, this was very informative. I am also
> > delighted that I helped in a very  small way expose what appears to be a
> > problem with memory leakage.
> 
> It was actually much worse than a memory leakage. It was actually
> freeing memory that is in use. (AFAICT, given how a GC works, it
> usually won't leak anything when it fires, but it can free something
> by mistake if the code that uses it is badly written.)
> See explaination in the comment of this issue[1] for why GC roots (and
> friends) are important.
> 
> [1] https://github.com/JuliaLang/julia/pull/11190#issuecomment-100066267
> 
> > I love this community!
> > 
> > On Sunday, May 17, 2015 at 7:51:59 PM UTC+2, Yichao Yu wrote:
> >> On Sun, May 17, 2015 at 12:52 PM, Milan Bouchet-Valat <[email protected]>
> >> 
> >> wrote:
> >> > Le dimanche 17 mai 2015 à 09:25 -0700, Mohammed El-Beltagy a écrit :
> >> > 
> >> > You are quite right about the type assertions and that @inbounds would
> >> > certainly speed things up.
> >> > However, I am concerned here with how memory was being allocated. I
> >> > wish
> >> > that somebody who is familiar with DataArray would explain this
> >> > behavior.
> >> > 
> >> > That's a known design issue with DataArrays, and the reason why John
> >> > Myles
> >> > White has started working on Nullable and NullableArrays to replace
> >> > them. As
> >> 
> >> Didn't know about this part of the story.
> >> 
> >> P.S. your example leads me to hit
> >> https://github.com/JuliaLang/julia/issues/11313 . Thank you for
> >> exposing it....
> >> 
> >> > Yichao noted, []/getindex is type-unstable for DataArrays as it can
> >> > return
> >> > NA, and this kills performance in Julia.
> >> > 
> >> > To improve performance, you can access the internals of the DataArray,
> >> > doing
> >> > something like:
> >> > 
> >> > function countGT(x::DataArray{Float64,1})
> >> > 
> >> >     count=0.0
> >> >     for i=1:length(x)
> >> >     
> >> >         if !isna(x, i)
> >> >         
> >> >             count+= (x.data[i]>5.0)? 1.0 : 0.0
> >> >         
> >> >         end
> >> >     
> >> >     end
> >> >     count
> >> > 
> >> > end
> >> > 
> >> > Always write isna(x, i) instead of isna(x[i]), since the latter suffers
> >> > from
> >> > type instability.
> >> > 
> >> > Regards
> >> > 
> >> > 
> >> > 
> >> > On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote:
> >> > 
> >> > On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy
> >> > 
> >> > <[email protected]> wrote:
> >> >> Today while trying optimize a piece code I came across a rather
> >> >> curious
> >> >> behavior of when allocation memory when accessing a DataArray.
> >> >> 
> >> >> x=rand(1:10,1000000);
> >> >> function countGT(x::Array{Int,1})
> >> > 
> >> > Since the algorithm is the same for both types, I think you don't need
> >> > the type assert here. Julia will automatically specialize on the type
> >> > you pass in.
> >> > 
> >> >>     count=0
> >> >>     for i=1:length(x)
> >> >>     
> >> >>       count+= (x[i]>5)? 1: 0
> >> > 
> >> > add `@inbounds` here will improve the performance for `Array`. Not
> >> > sure if it can help with `DataArray` yet though.
> >> > 
> >> >>     end
> >> >>     count
> >> >> 
> >> >> end
> >> >> 
> >> >> Here is what you get after running @time (compilation excluded)
> >> >> 
> >> >> @time countGT(x);
> >> >> elapsed time: 0.00847156 seconds (96 bytes allocated)
> >> >> 
> >> >> That is not too bad. @time at least allocated 80 bytes and the extra
> >> >> 16
> >> >> bytes is for creating the variable "count", so far so good.
> >> >> Now lets see if we do the same a floating point array.
> >> >> x=rand(1000000);
> >> >> function countGT(x::Array{Float64,1})
> >> >> 
> >> >>     count=0.0
> >> >>     for i=1:length(x)
> >> >>     
> >> >>       count+= (x[i]>5.0)? 1.0: 0.0
> >> >>     
> >> >>     end
> >> >>     count
> >> >> 
> >> >> end
> >> >> 
> >> >> countGT(x)
> >> >> @time countGT(x)
> >> >> 
> >> >> You get
> >> >> elapsed time: 0.00177126 seconds (96 bytes allocated)
> >> >> Which still pretty good. Now, the problem start to show up when I have
> >> >> a
> >> >> DataArray
> >> >> x=@data rand(1000000);
> >> >> function countGT(x::DataArray{Float64,1})
> >> >> 
> >> >>     count=0.0
> >> >>     for i=1:length(x)
> >> >>     
> >> >>       count+= (x[i]>5.0)? 1.0: 0.0
> >> >>     
> >> >>     end
> >> >>     count
> >> >> 
> >> >> end
> >> > 
> >> > `getindex` of DataArray appears to be not type stable. It returns
> >> > either `NAType` or the data type. I think this is probably the reason
> >> > for the allocation.
> >> > 
> >> >> countGT(x)
> >> >> @time countGT(x)
> >> >> 
> >> >> You we get
> >> >> elapsed time: 0.23610454 seconds (16000096 bytes allocated)
> >> >> 
> >> >> The bytes allocated seems to scale with the size of the DataArray. So
> >> >> it
> >> >> seems that mere act of accessing an element in a DataArray allocates
> >> >> memory.
> >> >> 
> >> >> I am wondering there could be a better way.
> >> > 
> >> > I'm not familiar with DataArrays and it's API but I would guess it can
> >> > use Nullable or sth similar.

Reply via email to