Many thanks Milan and Yichao, this was very informative. I am also 
delighted that I helped in a very  small way expose what appears to be a 
problem with memory leakage. 
I love this community!

On Sunday, May 17, 2015 at 7:51:59 PM UTC+2, Yichao Yu wrote:
>
> On Sun, May 17, 2015 at 12:52 PM, Milan Bouchet-Valat <[email protected] 
> <javascript:>> wrote: 
> > Le dimanche 17 mai 2015 à 09:25 -0700, Mohammed El-Beltagy a écrit : 
> > 
> > You are quite right about the type assertions and that @inbounds would 
> > certainly speed things up. 
> > However, I am concerned here with how memory was being allocated. I wish 
> > that somebody who is familiar with DataArray would explain this 
> behavior. 
> > 
> > That's a known design issue with DataArrays, and the reason why John 
> Myles 
> > White has started working on Nullable and NullableArrays to replace 
> them. As 
>
> Didn't know about this part of the story. 
>
> P.S. your example leads me to hit 
> https://github.com/JuliaLang/julia/issues/11313 . Thank you for 
> exposing it.... 
>
> > Yichao noted, []/getindex is type-unstable for DataArrays as it can 
> return 
> > NA, and this kills performance in Julia. 
> > 
> > To improve performance, you can access the internals of the DataArray, 
> doing 
> > something like: 
> > 
> > function countGT(x::DataArray{Float64,1}) 
> >     count=0.0 
> >     for i=1:length(x) 
> >         if !isna(x, i) 
> >             count+= (x.data[i]>5.0)? 1.0 : 0.0 
> >         end 
> >     end 
> >     count 
> > 
> > end 
> > 
> > Always write isna(x, i) instead of isna(x[i]), since the latter suffers 
> from 
> > type instability. 
> > 
> > Regards 
> > 
> > 
> > 
> > On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote: 
> > 
> > On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy 
> > <[email protected]> wrote: 
> >> Today while trying optimize a piece code I came across a rather curious 
> >> behavior of when allocation memory when accessing a DataArray. 
> >> 
> >> x=rand(1:10,1000000); 
> >> function countGT(x::Array{Int,1}) 
> > 
> > Since the algorithm is the same for both types, I think you don't need 
> > the type assert here. Julia will automatically specialize on the type 
> > you pass in. 
> > 
> >>     count=0 
> >>     for i=1:length(x) 
> >>       count+= (x[i]>5)? 1: 0 
> > 
> > add `@inbounds` here will improve the performance for `Array`. Not 
> > sure if it can help with `DataArray` yet though. 
> > 
> >>     end 
> >>     count 
> >> end 
> >> 
> >> Here is what you get after running @time (compilation excluded) 
> >> 
> >> @time countGT(x); 
> >> elapsed time: 0.00847156 seconds (96 bytes allocated) 
> >> 
> >> That is not too bad. @time at least allocated 80 bytes and the extra 16 
> >> bytes is for creating the variable "count", so far so good. 
> >> Now lets see if we do the same a floating point array. 
> >> x=rand(1000000); 
> >> function countGT(x::Array{Float64,1}) 
> >>     count=0.0 
> >>     for i=1:length(x) 
> >>       count+= (x[i]>5.0)? 1.0: 0.0 
> >>     end 
> >>     count 
> >> end 
> >> 
> >> countGT(x) 
> >> @time countGT(x) 
> >> 
> >> You get 
> >> elapsed time: 0.00177126 seconds (96 bytes allocated) 
> >> Which still pretty good. Now, the problem start to show up when I have 
> a 
> >> DataArray 
> >> x=@data rand(1000000); 
> >> function countGT(x::DataArray{Float64,1}) 
> >>     count=0.0 
> >>     for i=1:length(x) 
> >>       count+= (x[i]>5.0)? 1.0: 0.0 
> >>     end 
> >>     count 
> >> end 
> > 
> > `getindex` of DataArray appears to be not type stable. It returns 
> > either `NAType` or the data type. I think this is probably the reason 
> > for the allocation. 
> > 
> >> 
> >> countGT(x) 
> >> @time countGT(x) 
> >> 
> >> You we get 
> >> elapsed time: 0.23610454 seconds (16000096 bytes allocated) 
> >> 
> >> The bytes allocated seems to scale with the size of the DataArray. So 
> it 
> >> seems that mere act of accessing an element in a DataArray allocates 
> >> memory. 
> >> 
> >> I am wondering there could be a better way. 
> >> 
> >> 
> > 
> > I'm not familiar with DataArrays and it's API but I would guess it can 
> > use Nullable or sth similar. 
> > 
> >> 
> >> 
> >> 
> >> 
> >> 
> > 
> > 
>

Reply via email to