Today while trying optimize a piece code I came across a rather curious 
behavior of when allocation memory when accessing a DataArray. 

x=rand(1:10,1000000);
function countGT(x::Array{Int,1})
    count=0
    for i=1:length(x)
      count+= (x[i]>5)? 1: 0
    end
    count
end

Here is what you get after running @time (compilation excluded) 

@time countGT(x);
elapsed time: 0.00847156 seconds (96 bytes allocated)

That is not too bad. @time at least allocated 80 bytes and the extra 16 
bytes is for creating the variable "count", so far so good.
Now lets see if we do the same a floating point array. 
x=rand(1000000);
function countGT(x::Array{Float64,1})
    count=0.0
    for i=1:length(x)
      count+= (x[i]>5.0)? 1.0: 0.0
    end
    count
end

countGT(x)
@time countGT(x)

You get 
elapsed time: 0.00177126 seconds (96 bytes allocated)
Which still pretty good. Now, the problem start to show up when I have a 
DataArray
x=@data rand(1000000);
function countGT(x::DataArray{Float64,1})
    count=0.0
    for i=1:length(x)
      count+= (x[i]>5.0)? 1.0: 0.0
    end
    count
end

countGT(x)
@time countGT(x)

You we get
elapsed time: 0.23610454 seconds (16000096 bytes allocated)

The bytes allocated seems to scale with the size of the DataArray. So it 
seems that mere act of accessing an element in a DataArray allocates 
memory. 

I am wondering there could be a better way. 







Reply via email to