On Sun, May 17, 2015 at 5:05 PM, Mohammed El-Beltagy
<[email protected]> wrote:
> Many thanks Milan and Yichao, this was very informative. I am also delighted
> that I helped in a very  small way expose what appears to be a problem with
> memory leakage.

It was actually much worse than a memory leakage. It was actually
freeing memory that is in use. (AFAICT, given how a GC works, it
usually won't leak anything when it fires, but it can free something
by mistake if the code that uses it is badly written.)
See explaination in the comment of this issue[1] for why GC roots (and
friends) are important.

[1] https://github.com/JuliaLang/julia/pull/11190#issuecomment-100066267

> I love this community!
>
> On Sunday, May 17, 2015 at 7:51:59 PM UTC+2, Yichao Yu wrote:
>>
>> On Sun, May 17, 2015 at 12:52 PM, Milan Bouchet-Valat <[email protected]>
>> wrote:
>> > Le dimanche 17 mai 2015 à 09:25 -0700, Mohammed El-Beltagy a écrit :
>> >
>> > You are quite right about the type assertions and that @inbounds would
>> > certainly speed things up.
>> > However, I am concerned here with how memory was being allocated. I wish
>> > that somebody who is familiar with DataArray would explain this
>> > behavior.
>> >
>> > That's a known design issue with DataArrays, and the reason why John
>> > Myles
>> > White has started working on Nullable and NullableArrays to replace
>> > them. As
>>
>> Didn't know about this part of the story.
>>
>> P.S. your example leads me to hit
>> https://github.com/JuliaLang/julia/issues/11313 . Thank you for
>> exposing it....
>>
>> > Yichao noted, []/getindex is type-unstable for DataArrays as it can
>> > return
>> > NA, and this kills performance in Julia.
>> >
>> > To improve performance, you can access the internals of the DataArray,
>> > doing
>> > something like:
>> >
>> > function countGT(x::DataArray{Float64,1})
>> >     count=0.0
>> >     for i=1:length(x)
>> >         if !isna(x, i)
>> >             count+= (x.data[i]>5.0)? 1.0 : 0.0
>> >         end
>> >     end
>> >     count
>> >
>> > end
>> >
>> > Always write isna(x, i) instead of isna(x[i]), since the latter suffers
>> > from
>> > type instability.
>> >
>> > Regards
>> >
>> >
>> >
>> > On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote:
>> >
>> > On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy
>> > <[email protected]> wrote:
>> >> Today while trying optimize a piece code I came across a rather curious
>> >> behavior of when allocation memory when accessing a DataArray.
>> >>
>> >> x=rand(1:10,1000000);
>> >> function countGT(x::Array{Int,1})
>> >
>> > Since the algorithm is the same for both types, I think you don't need
>> > the type assert here. Julia will automatically specialize on the type
>> > you pass in.
>> >
>> >>     count=0
>> >>     for i=1:length(x)
>> >>       count+= (x[i]>5)? 1: 0
>> >
>> > add `@inbounds` here will improve the performance for `Array`. Not
>> > sure if it can help with `DataArray` yet though.
>> >
>> >>     end
>> >>     count
>> >> end
>> >>
>> >> Here is what you get after running @time (compilation excluded)
>> >>
>> >> @time countGT(x);
>> >> elapsed time: 0.00847156 seconds (96 bytes allocated)
>> >>
>> >> That is not too bad. @time at least allocated 80 bytes and the extra 16
>> >> bytes is for creating the variable "count", so far so good.
>> >> Now lets see if we do the same a floating point array.
>> >> x=rand(1000000);
>> >> function countGT(x::Array{Float64,1})
>> >>     count=0.0
>> >>     for i=1:length(x)
>> >>       count+= (x[i]>5.0)? 1.0: 0.0
>> >>     end
>> >>     count
>> >> end
>> >>
>> >> countGT(x)
>> >> @time countGT(x)
>> >>
>> >> You get
>> >> elapsed time: 0.00177126 seconds (96 bytes allocated)
>> >> Which still pretty good. Now, the problem start to show up when I have
>> >> a
>> >> DataArray
>> >> x=@data rand(1000000);
>> >> function countGT(x::DataArray{Float64,1})
>> >>     count=0.0
>> >>     for i=1:length(x)
>> >>       count+= (x[i]>5.0)? 1.0: 0.0
>> >>     end
>> >>     count
>> >> end
>> >
>> > `getindex` of DataArray appears to be not type stable. It returns
>> > either `NAType` or the data type. I think this is probably the reason
>> > for the allocation.
>> >
>> >>
>> >> countGT(x)
>> >> @time countGT(x)
>> >>
>> >> You we get
>> >> elapsed time: 0.23610454 seconds (16000096 bytes allocated)
>> >>
>> >> The bytes allocated seems to scale with the size of the DataArray. So
>> >> it
>> >> seems that mere act of accessing an element in a DataArray allocates
>> >> memory.
>> >>
>> >> I am wondering there could be a better way.
>> >>
>> >>
>> >
>> > I'm not familiar with DataArrays and it's API but I would guess it can
>> > use Nullable or sth similar.
>> >
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >

Reply via email to