You are quite right about the type assertions and that @inbounds would certainly speed things up. However, I am concerned here with how memory was being allocated. I wish that somebody who is familiar with DataArray would explain this behavior.
On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote: > > On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy > <[email protected] <javascript:>> wrote: > > Today while trying optimize a piece code I came across a rather curious > > behavior of when allocation memory when accessing a DataArray. > > > > x=rand(1:10,1000000); > > function countGT(x::Array{Int,1}) > > Since the algorithm is the same for both types, I think you don't need > the type assert here. Julia will automatically specialize on the type > you pass in. > > > count=0 > > for i=1:length(x) > > count+= (x[i]>5)? 1: 0 > > add `@inbounds` here will improve the performance for `Array`. Not > sure if it can help with `DataArray` yet though. > > > end > > count > > end > > > > Here is what you get after running @time (compilation excluded) > > > > @time countGT(x); > > elapsed time: 0.00847156 seconds (96 bytes allocated) > > > > That is not too bad. @time at least allocated 80 bytes and the extra 16 > > bytes is for creating the variable "count", so far so good. > > Now lets see if we do the same a floating point array. > > x=rand(1000000); > > function countGT(x::Array{Float64,1}) > > count=0.0 > > for i=1:length(x) > > count+= (x[i]>5.0)? 1.0: 0.0 > > end > > count > > end > > > > countGT(x) > > @time countGT(x) > > > > You get > > elapsed time: 0.00177126 seconds (96 bytes allocated) > > Which still pretty good. Now, the problem start to show up when I have a > > DataArray > > x=@data rand(1000000); > > function countGT(x::DataArray{Float64,1}) > > count=0.0 > > for i=1:length(x) > > count+= (x[i]>5.0)? 1.0: 0.0 > > end > > count > > end > > `getindex` of DataArray appears to be not type stable. It returns > either `NAType` or the data type. I think this is probably the reason > for the allocation. > > > > > countGT(x) > > @time countGT(x) > > > > You we get > > elapsed time: 0.23610454 seconds (16000096 bytes allocated) > > > > The bytes allocated seems to scale with the size of the DataArray. So it > > seems that mere act of accessing an element in a DataArray allocates > memory. > > > > I am wondering there could be a better way. > > > > > > I'm not familiar with DataArrays and it's API but I would guess it can > use Nullable or sth similar. > > > > > > > > > > > >
