On Sun, May 17, 2015 at 6:06 PM, Tim Holy <[email protected]> wrote: > To clarify, there were actually two issues: one thing that may not be clear is > that >> elapsed time: 0.23610454 seconds (16000096 bytes allocated) > > tells you how many bytes were allocated, but it omits mentioning that most/all > of those were (or will be) freed. In other words, this was _not_ symptomatic > of a leak---it was just a message whose meaning could have been clearer. That > behavior just changed in https://github.com/JuliaLang/julia/pull/11186. > Hopefully it will be clearer going forward. > > The issue that Yichao mentioned was more subtle and a much bigger problem, but > I don't think this is what you were noticing, Mohammed. That more serious > issue seems to be fixed by https://github.com/JuliaLang/julia/pull/11314
Thanks for the clarification. That's exactly what I meant. Sorry for the confusion.... =( > > --Tim > > On Sunday, May 17, 2015 05:53:14 PM Yichao Yu wrote: >> On Sun, May 17, 2015 at 5:05 PM, Mohammed El-Beltagy >> >> <[email protected]> wrote: >> > Many thanks Milan and Yichao, this was very informative. I am also >> > delighted that I helped in a very small way expose what appears to be a >> > problem with memory leakage. >> >> It was actually much worse than a memory leakage. It was actually >> freeing memory that is in use. (AFAICT, given how a GC works, it >> usually won't leak anything when it fires, but it can free something >> by mistake if the code that uses it is badly written.) >> See explaination in the comment of this issue[1] for why GC roots (and >> friends) are important. >> >> [1] https://github.com/JuliaLang/julia/pull/11190#issuecomment-100066267 >> >> > I love this community! >> > >> > On Sunday, May 17, 2015 at 7:51:59 PM UTC+2, Yichao Yu wrote: >> >> On Sun, May 17, 2015 at 12:52 PM, Milan Bouchet-Valat <[email protected]> >> >> >> >> wrote: >> >> > Le dimanche 17 mai 2015 à 09:25 -0700, Mohammed El-Beltagy a écrit : >> >> > >> >> > You are quite right about the type assertions and that @inbounds would >> >> > certainly speed things up. >> >> > However, I am concerned here with how memory was being allocated. I >> >> > wish >> >> > that somebody who is familiar with DataArray would explain this >> >> > behavior. >> >> > >> >> > That's a known design issue with DataArrays, and the reason why John >> >> > Myles >> >> > White has started working on Nullable and NullableArrays to replace >> >> > them. As >> >> >> >> Didn't know about this part of the story. >> >> >> >> P.S. your example leads me to hit >> >> https://github.com/JuliaLang/julia/issues/11313 . Thank you for >> >> exposing it.... >> >> >> >> > Yichao noted, []/getindex is type-unstable for DataArrays as it can >> >> > return >> >> > NA, and this kills performance in Julia. >> >> > >> >> > To improve performance, you can access the internals of the DataArray, >> >> > doing >> >> > something like: >> >> > >> >> > function countGT(x::DataArray{Float64,1}) >> >> > >> >> > count=0.0 >> >> > for i=1:length(x) >> >> > >> >> > if !isna(x, i) >> >> > >> >> > count+= (x.data[i]>5.0)? 1.0 : 0.0 >> >> > >> >> > end >> >> > >> >> > end >> >> > count >> >> > >> >> > end >> >> > >> >> > Always write isna(x, i) instead of isna(x[i]), since the latter suffers >> >> > from >> >> > type instability. >> >> > >> >> > Regards >> >> > >> >> > >> >> > >> >> > On Sunday, May 17, 2015 at 6:12:11 PM UTC+2, Yichao Yu wrote: >> >> > >> >> > On Sun, May 17, 2015 at 11:28 AM, Mohammed El-Beltagy >> >> > >> >> > <[email protected]> wrote: >> >> >> Today while trying optimize a piece code I came across a rather >> >> >> curious >> >> >> behavior of when allocation memory when accessing a DataArray. >> >> >> >> >> >> x=rand(1:10,1000000); >> >> >> function countGT(x::Array{Int,1}) >> >> > >> >> > Since the algorithm is the same for both types, I think you don't need >> >> > the type assert here. Julia will automatically specialize on the type >> >> > you pass in. >> >> > >> >> >> count=0 >> >> >> for i=1:length(x) >> >> >> >> >> >> count+= (x[i]>5)? 1: 0 >> >> > >> >> > add `@inbounds` here will improve the performance for `Array`. Not >> >> > sure if it can help with `DataArray` yet though. >> >> > >> >> >> end >> >> >> count >> >> >> >> >> >> end >> >> >> >> >> >> Here is what you get after running @time (compilation excluded) >> >> >> >> >> >> @time countGT(x); >> >> >> elapsed time: 0.00847156 seconds (96 bytes allocated) >> >> >> >> >> >> That is not too bad. @time at least allocated 80 bytes and the extra >> >> >> 16 >> >> >> bytes is for creating the variable "count", so far so good. >> >> >> Now lets see if we do the same a floating point array. >> >> >> x=rand(1000000); >> >> >> function countGT(x::Array{Float64,1}) >> >> >> >> >> >> count=0.0 >> >> >> for i=1:length(x) >> >> >> >> >> >> count+= (x[i]>5.0)? 1.0: 0.0 >> >> >> >> >> >> end >> >> >> count >> >> >> >> >> >> end >> >> >> >> >> >> countGT(x) >> >> >> @time countGT(x) >> >> >> >> >> >> You get >> >> >> elapsed time: 0.00177126 seconds (96 bytes allocated) >> >> >> Which still pretty good. Now, the problem start to show up when I have >> >> >> a >> >> >> DataArray >> >> >> x=@data rand(1000000); >> >> >> function countGT(x::DataArray{Float64,1}) >> >> >> >> >> >> count=0.0 >> >> >> for i=1:length(x) >> >> >> >> >> >> count+= (x[i]>5.0)? 1.0: 0.0 >> >> >> >> >> >> end >> >> >> count >> >> >> >> >> >> end >> >> > >> >> > `getindex` of DataArray appears to be not type stable. It returns >> >> > either `NAType` or the data type. I think this is probably the reason >> >> > for the allocation. >> >> > >> >> >> countGT(x) >> >> >> @time countGT(x) >> >> >> >> >> >> You we get >> >> >> elapsed time: 0.23610454 seconds (16000096 bytes allocated) >> >> >> >> >> >> The bytes allocated seems to scale with the size of the DataArray. So >> >> >> it >> >> >> seems that mere act of accessing an element in a DataArray allocates >> >> >> memory. >> >> >> >> >> >> I am wondering there could be a better way. >> >> > >> >> > I'm not familiar with DataArrays and it's API but I would guess it can >> >> > use Nullable or sth similar. >
