Le dimanche 17 mai 2015 à 09:25 -0700, Mohammed El-Beltagy a écrit :
> You are quite right about the type assertions and that @inbounds would
> certainly speed things up. 
> However, I am concerned here with how memory was being allocated. I
> wish that somebody who is familiar with DataArray would explain this
> behavior.

That's a known design issue with DataArrays, and the reason why John
Myles White has started working on Nullable and NullableArrays to
replace them. As Yichao noted, []/getindex is type-unstable for
DataArrays as it can return NA, and this kills performance in Julia.

To improve performance, you can access the internals of the DataArray,
doing something like:

function countGT(x::DataArray{Float64,1}) 
    count=0.0
    for i=1:length(x)
        if !isna(x, i)
            count+= (x.data[i]>5.0)? 1.0 : 0.0
        end
    end
    count

 end

Always write isna(x, i) instead of isna(x[i]), since the latter suffers
from type instability.

Regards



> On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote:
> 
>         On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy 
>         <[email protected]> wrote: 
>         > Today while trying optimize a piece code I came across a
>         rather curious 
>         > behavior of when allocation memory when accessing a
>         DataArray. 
>         > 
>         > x=rand(1:10,1000000); 
>         > function countGT(x::Array{Int,1}) 
>         
>         Since the algorithm is the same for both types, I think you
>         don't need 
>         the type assert here. Julia will automatically specialize on
>         the type 
>         you pass in. 
>         
>         >     count=0 
>         >     for i=1:length(x) 
>         >       count+= (x[i]>5)? 1: 0 
>         
>         add `@inbounds` here will improve the performance for `Array`.
>         Not 
>         sure if it can help with `DataArray` yet though. 
>         
>         >     end 
>         >     count 
>         > end 
>         > 
>         > Here is what you get after running @time (compilation
>         excluded) 
>         > 
>         > @time countGT(x); 
>         > elapsed time: 0.00847156 seconds (96 bytes allocated) 
>         > 
>         > That is not too bad. @time at least allocated 80 bytes and
>         the extra 16 
>         > bytes is for creating the variable "count", so far so good. 
>         > Now lets see if we do the same a floating point array. 
>         > x=rand(1000000); 
>         > function countGT(x::Array{Float64,1}) 
>         >     count=0.0 
>         >     for i=1:length(x) 
>         >       count+= (x[i]>5.0)? 1.0: 0.0 
>         >     end 
>         >     count 
>         > end 
>         > 
>         > countGT(x) 
>         > @time countGT(x) 
>         > 
>         > You get 
>         > elapsed time: 0.00177126 seconds (96 bytes allocated) 
>         > Which still pretty good. Now, the problem start to show up
>         when I have a 
>         > DataArray 
>         > x=@data rand(1000000); 
>         > function countGT(x::DataArray{Float64,1}) 
>         >     count=0.0 
>         >     for i=1:length(x) 
>         >       count+= (x[i]>5.0)? 1.0: 0.0 
>         >     end 
>         >     count 
>         > end 
>         
>         `getindex` of DataArray appears to be not type stable. It
>         returns 
>         either `NAType` or the data type. I think this is probably the
>         reason 
>         for the allocation. 
>         
>         > 
>         > countGT(x) 
>         > @time countGT(x) 
>         > 
>         > You we get 
>         > elapsed time: 0.23610454 seconds (16000096 bytes allocated) 
>         > 
>         > The bytes allocated seems to scale with the size of the
>         DataArray. So it 
>         > seems that mere act of accessing an element in a DataArray
>         allocates memory. 
>         > 
>         > I am wondering there could be a better way. 
>         > 
>         > 
>         
>         I'm not familiar with DataArrays and it's API but I would
>         guess it can 
>         use Nullable or sth similar. 
>         
>         > 
>         > 
>         > 
>         > 
>         > 

Reply via email to