You are quite right about the type assertions and that @inbounds would 
certainly speed things up. 
However, I am concerned here with how memory was being allocated. I wish 
that somebody who is familiar with DataArray would explain this behavior. 

On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote:
>
> On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy 
> <[email protected] <javascript:>> wrote: 
> > Today while trying optimize a piece code I came across a rather curious 
> > behavior of when allocation memory when accessing a DataArray. 
> > 
> > x=rand(1:10,1000000); 
> > function countGT(x::Array{Int,1}) 
>
> Since the algorithm is the same for both types, I think you don't need 
> the type assert here. Julia will automatically specialize on the type 
> you pass in. 
>
> >     count=0 
> >     for i=1:length(x) 
> >       count+= (x[i]>5)? 1: 0 
>
> add `@inbounds` here will improve the performance for `Array`. Not 
> sure if it can help with `DataArray` yet though. 
>
> >     end 
> >     count 
> > end 
> > 
> > Here is what you get after running @time (compilation excluded) 
> > 
> > @time countGT(x); 
> > elapsed time: 0.00847156 seconds (96 bytes allocated) 
> > 
> > That is not too bad. @time at least allocated 80 bytes and the extra 16 
> > bytes is for creating the variable "count", so far so good. 
> > Now lets see if we do the same a floating point array. 
> > x=rand(1000000); 
> > function countGT(x::Array{Float64,1}) 
> >     count=0.0 
> >     for i=1:length(x) 
> >       count+= (x[i]>5.0)? 1.0: 0.0 
> >     end 
> >     count 
> > end 
> > 
> > countGT(x) 
> > @time countGT(x) 
> > 
> > You get 
> > elapsed time: 0.00177126 seconds (96 bytes allocated) 
> > Which still pretty good. Now, the problem start to show up when I have a 
> > DataArray 
> > x=@data rand(1000000); 
> > function countGT(x::DataArray{Float64,1}) 
> >     count=0.0 
> >     for i=1:length(x) 
> >       count+= (x[i]>5.0)? 1.0: 0.0 
> >     end 
> >     count 
> > end 
>
> `getindex` of DataArray appears to be not type stable. It returns 
> either `NAType` or the data type. I think this is probably the reason 
> for the allocation. 
>
> > 
> > countGT(x) 
> > @time countGT(x) 
> > 
> > You we get 
> > elapsed time: 0.23610454 seconds (16000096 bytes allocated) 
> > 
> > The bytes allocated seems to scale with the size of the DataArray. So it 
> > seems that mere act of accessing an element in a DataArray allocates 
> memory. 
> > 
> > I am wondering there could be a better way. 
> > 
> > 
>
> I'm not familiar with DataArrays and it's API but I would guess it can 
> use Nullable or sth similar. 
>
> > 
> > 
> > 
> > 
> > 
>

Reply via email to