On Sun, May 17, 2015 at 6:06 PM, Tim Holy <[email protected]> wrote:
> To clarify, there were actually two issues: one thing that may not be clear is
> that
>> elapsed time: 0.23610454 seconds (16000096 bytes allocated)
>
> tells you how many bytes were allocated, but it omits mentioning that most/all
> of those were (or will be) freed. In other words, this was _not_ symptomatic
> of a leak---it was just a message whose meaning could have been clearer. That
> behavior just changed in https://github.com/JuliaLang/julia/pull/11186.
> Hopefully it will be clearer going forward.
>
> The issue that Yichao mentioned was more subtle and a much bigger problem, but
> I don't think this is what you were noticing, Mohammed. That more serious
> issue seems to be fixed by https://github.com/JuliaLang/julia/pull/11314

Thanks for the clarification. That's exactly what I meant.

Sorry for the confusion.... =(

>
> --Tim
>
> On Sunday, May 17, 2015 05:53:14 PM Yichao Yu wrote:
>> On Sun, May 17, 2015 at 5:05 PM, Mohammed El-Beltagy
>>
>> <[email protected]> wrote:
>> > Many thanks Milan and Yichao, this was very informative. I am also
>> > delighted that I helped in a very  small way expose what appears to be a
>> > problem with memory leakage.
>>
>> It was actually much worse than a memory leakage. It was actually
>> freeing memory that is in use. (AFAICT, given how a GC works, it
>> usually won't leak anything when it fires, but it can free something
>> by mistake if the code that uses it is badly written.)
>> See explaination in the comment of this issue[1] for why GC roots (and
>> friends) are important.
>>
>> [1] https://github.com/JuliaLang/julia/pull/11190#issuecomment-100066267
>>
>> > I love this community!
>> >
>> > On Sunday, May 17, 2015 at 7:51:59 PM UTC+2, Yichao Yu wrote:
>> >> On Sun, May 17, 2015 at 12:52 PM, Milan Bouchet-Valat <[email protected]>
>> >>
>> >> wrote:
>> >> > Le dimanche 17 mai 2015 à 09:25 -0700, Mohammed El-Beltagy a écrit :
>> >> >
>> >> > You are quite right about the type assertions and that @inbounds would
>> >> > certainly speed things up.
>> >> > However, I am concerned here with how memory was being allocated. I
>> >> > wish
>> >> > that somebody who is familiar with DataArray would explain this
>> >> > behavior.
>> >> >
>> >> > That's a known design issue with DataArrays, and the reason why John
>> >> > Myles
>> >> > White has started working on Nullable and NullableArrays to replace
>> >> > them. As
>> >>
>> >> Didn't know about this part of the story.
>> >>
>> >> P.S. your example leads me to hit
>> >> https://github.com/JuliaLang/julia/issues/11313 . Thank you for
>> >> exposing it....
>> >>
>> >> > Yichao noted, []/getindex is type-unstable for DataArrays as it can
>> >> > return
>> >> > NA, and this kills performance in Julia.
>> >> >
>> >> > To improve performance, you can access the internals of the DataArray,
>> >> > doing
>> >> > something like:
>> >> >
>> >> > function countGT(x::DataArray{Float64,1})
>> >> >
>> >> >     count=0.0
>> >> >     for i=1:length(x)
>> >> >
>> >> >         if !isna(x, i)
>> >> >
>> >> >             count+= (x.data[i]>5.0)? 1.0 : 0.0
>> >> >
>> >> >         end
>> >> >
>> >> >     end
>> >> >     count
>> >> >
>> >> > end
>> >> >
>> >> > Always write isna(x, i) instead of isna(x[i]), since the latter suffers
>> >> > from
>> >> > type instability.
>> >> >
>> >> > Regards
>> >> >
>> >> >
>> >> >
>> >> > On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote:
>> >> >
>> >> > On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy
>> >> >
>> >> > <[email protected]> wrote:
>> >> >> Today while trying optimize a piece code I came across a rather
>> >> >> curious
>> >> >> behavior of when allocation memory when accessing a DataArray.
>> >> >>
>> >> >> x=rand(1:10,1000000);
>> >> >> function countGT(x::Array{Int,1})
>> >> >
>> >> > Since the algorithm is the same for both types, I think you don't need
>> >> > the type assert here. Julia will automatically specialize on the type
>> >> > you pass in.
>> >> >
>> >> >>     count=0
>> >> >>     for i=1:length(x)
>> >> >>
>> >> >>       count+= (x[i]>5)? 1: 0
>> >> >
>> >> > add `@inbounds` here will improve the performance for `Array`. Not
>> >> > sure if it can help with `DataArray` yet though.
>> >> >
>> >> >>     end
>> >> >>     count
>> >> >>
>> >> >> end
>> >> >>
>> >> >> Here is what you get after running @time (compilation excluded)
>> >> >>
>> >> >> @time countGT(x);
>> >> >> elapsed time: 0.00847156 seconds (96 bytes allocated)
>> >> >>
>> >> >> That is not too bad. @time at least allocated 80 bytes and the extra
>> >> >> 16
>> >> >> bytes is for creating the variable "count", so far so good.
>> >> >> Now lets see if we do the same a floating point array.
>> >> >> x=rand(1000000);
>> >> >> function countGT(x::Array{Float64,1})
>> >> >>
>> >> >>     count=0.0
>> >> >>     for i=1:length(x)
>> >> >>
>> >> >>       count+= (x[i]>5.0)? 1.0: 0.0
>> >> >>
>> >> >>     end
>> >> >>     count
>> >> >>
>> >> >> end
>> >> >>
>> >> >> countGT(x)
>> >> >> @time countGT(x)
>> >> >>
>> >> >> You get
>> >> >> elapsed time: 0.00177126 seconds (96 bytes allocated)
>> >> >> Which still pretty good. Now, the problem start to show up when I have
>> >> >> a
>> >> >> DataArray
>> >> >> x=@data rand(1000000);
>> >> >> function countGT(x::DataArray{Float64,1})
>> >> >>
>> >> >>     count=0.0
>> >> >>     for i=1:length(x)
>> >> >>
>> >> >>       count+= (x[i]>5.0)? 1.0: 0.0
>> >> >>
>> >> >>     end
>> >> >>     count
>> >> >>
>> >> >> end
>> >> >
>> >> > `getindex` of DataArray appears to be not type stable. It returns
>> >> > either `NAType` or the data type. I think this is probably the reason
>> >> > for the allocation.
>> >> >
>> >> >> countGT(x)
>> >> >> @time countGT(x)
>> >> >>
>> >> >> You we get
>> >> >> elapsed time: 0.23610454 seconds (16000096 bytes allocated)
>> >> >>
>> >> >> The bytes allocated seems to scale with the size of the DataArray. So
>> >> >> it
>> >> >> seems that mere act of accessing an element in a DataArray allocates
>> >> >> memory.
>> >> >>
>> >> >> I am wondering there could be a better way.
>> >> >
>> >> > I'm not familiar with DataArrays and it's API but I would guess it can
>> >> > use Nullable or sth similar.
>

Reply via email to