The variance you're seeing is most likely due to the garbage collector kicking with that much memory being allocated and then abandoned.
On Mon, Nov 24, 2014 at 7:53 PM, David Smith <[email protected]> wrote: > This is what I was thinking. I just assumed that the fill() time would be > constant for both and factored that out, not knowing that malloc() was lazy. > > I get similar results for Stefan's bench, although the variance is large. > > On Monday, November 24, 2014 6:20:21 PM UTC-6, Stefan Karpinski wrote: >> >> Should the comparison actually be more like this: >> >> julia> @time begin >> x = Array(Int,N) >> fill!(x,1) >> end; >> elapsed time: 6.782572096 seconds (8000000128 bytes allocated) >> >> julia> @time begin >> x = zeros(Int,N) >> fill!(x,1) >> end; >> elapsed time: 14.166256835 seconds (8000000176 bytes allocated) >> >> >> At least that's the comparison that makes sense for code that allocates >> and then initializes an array. I consistently see a 2x slowdown or more. >> >> On Mon, Nov 24, 2014 at 7:09 PM, Jameson Nash <[email protected]> wrote: >> >>> > But you initialized it in both cases. >>> >>> Yes. >>> >>> > Is there a compiler optimization going on here that combines the >>> zeros() and fill()? >>> >>> No. >>> >>> But there is a kernel optimization going on that complicates this >>> measurement. Approximately, the memory requested by `malloc` (& friends) is >>> not actually allocated until you try to read or write to it. So there are >>> in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end], >>> and fill()), where that second operation is unavoidable, and orders of >>> magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3. >>> Whereas I measured the speed of 1+2+3 vs 1+2+3+3. >>> >>> On Mon Nov 24 2014 at 6:59:50 PM David Smith <[email protected]> wrote: >>> >>>> But you initialized it in both cases. Is there a compiler optimization >>>> going on here that combines the zeros() and fill()? >>>> >>>> >>>> On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote: >>>> >>>>> yes. the point is to compare the cost of implicitly calling `zero` >>>>> (resulting in the equivalent of calling zero twice) to the cost of not >>>>> initializing the memory before writing to it. I could alternatively have >>>>> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information. >>>>> >>>>> On Mon Nov 24 2014 at 5:57:29 PM David Smith <[email protected]> >>>>> wrote: >>>>> >>>>>> Did you mean to call zeros() in both cases? >>>>>> >>>>>> >>>>>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote: >>>>>> >>>>>>> It appears the fill operation accounts for about 0.15 seconds of the >>>>>>> 6.15 seconds that my OS X laptop takes to create this array: >>>>>>> >>>>>>> $ ./julia -q >>>>>>> >>>>>>> *julia> **N=10^9* >>>>>>> >>>>>>> *1000000000* >>>>>>> >>>>>>> >>>>>>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end* >>>>>>> >>>>>>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% >>>>>>> gc time) >>>>>>> >>>>>>> *0-element Array{Array{Int64,1},1}* >>>>>>> >>>>>>> >>>>>>> $ ./julia -q >>>>>>> >>>>>>> *julia> **N=10^9* >>>>>>> >>>>>>> *1000000000* >>>>>>> >>>>>>> >>>>>>> *julia> **@time x=zeros(Int64,N)* >>>>>>> >>>>>>> elapsed time: 6.160623835 seconds (8000014320 bytes allocated, 0.22% >>>>>>> gc time) >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith <[email protected]> >>>>>>>> wrote: >>>>>>>> > To add some data to this conversation, I just timed allocating a >>>>>>>> billion >>>>>>>> > Int64s on my macbook, and I got this (I ran these multiple times >>>>>>>> before this >>>>>>>> > and got similar timings): >>>>>>>> > >>>>>>>> > julia> N=1_000_000_000 >>>>>>>> > 1000000000 >>>>>>>> > >>>>>>>> > julia> @time x = Array(Int64,N); >>>>>>>> > elapsed time: 0.022577671 seconds (8000000128 bytes allocated) >>>>>>>> > >>>>>>>> > julia> @time x = zeros(Int64,N); >>>>>>>> > elapsed time: 3.95432248 seconds (8000000152 bytes allocated) >>>>>>>> > >>>>>>>> > So we are talking adding possibly seconds to a program per large >>>>>>>> array >>>>>>>> > allocation. >>>>>>>> >>>>>>>> This is not quite right -- the first does not actually map the pages >>>>>>>> into memory; this is only done lazily when they are accessed the >>>>>>>> first >>>>>>>> time. You need to compare "alloc uninitialized; then initialize >>>>>>>> once" >>>>>>>> with "alloc zero-initialized; then initialize again". >>>>>>>> >>>>>>>> Current high-end system architectures have memory write speeds of >>>>>>>> ten >>>>>>>> or twenty GByte per second; this is what you should see for very >>>>>>>> large >>>>>>>> arrays -- this would be about 0.4 seconds for your case. For smaller >>>>>>>> arrays, the data would reside in the cache, so that the allocation >>>>>>>> overhead should be significantly smaller even. >>>>>>>> >>>>>>>> -erik >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>> Erik Schnetter <[email protected]> >>>>>>>> http://www.perimeterinstitute.ca/personal/eschnetter/ >>>>>>>> >>>>>>> >>
