Should the comparison actually be more like this:

julia> @time begin
           x = Array(Int,N)
           fill!(x,1)
       end;
elapsed time: 6.782572096 seconds (8000000128 bytes allocated)

julia> @time begin
           x = zeros(Int,N)
           fill!(x,1)
       end;
elapsed time: 14.166256835 seconds (8000000176 bytes allocated)


At least that's the comparison that makes sense for code that allocates and
then initializes an array. I consistently see a 2x slowdown or more.

On Mon, Nov 24, 2014 at 7:09 PM, Jameson Nash <[email protected]> wrote:

> > But you initialized it in both cases.
>
> Yes.
>
> > Is there a compiler optimization going on here that combines the zeros()
> and fill()?
>
> No.
>
> But there is a kernel optimization going on that complicates this
> measurement. Approximately, the memory requested by `malloc` (& friends) is
> not actually allocated until you try to read or write to it. So there are
> in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end],
> and fill()), where that second operation is unavoidable, and orders of
> magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3.
> Whereas I measured the speed of 1+2+3 vs 1+2+3+3.
>
> On Mon Nov 24 2014 at 6:59:50 PM David Smith <[email protected]>
> wrote:
>
>> But you initialized it in both cases.  Is there a compiler optimization
>> going on here that combines the zeros() and fill()?
>>
>>
>> On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:
>>
>>> yes. the point is to compare the cost of implicitly calling `zero`
>>> (resulting in the equivalent of calling zero twice) to the cost of not
>>> initializing the memory before writing to it. I could alternatively have
>>> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.
>>>
>>> On Mon Nov 24 2014 at 5:57:29 PM David Smith <[email protected]> wrote:
>>>
>>>> Did you mean to call zeros() in both cases?
>>>>
>>>>
>>>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>>>>
>>>>> It appears the fill operation accounts for about 0.15 seconds of the
>>>>> 6.15 seconds that my OS X laptop takes to create this array:
>>>>>
>>>>> $ ./julia -q
>>>>>
>>>>> *julia> **N=10^9*
>>>>>
>>>>> *1000000000*
>>>>>
>>>>>
>>>>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>>>>>
>>>>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71%
>>>>> gc time)
>>>>>
>>>>> *0-element Array{Array{Int64,1},1}*
>>>>>
>>>>>
>>>>> $ ./julia -q
>>>>>
>>>>> *julia> **N=10^9*
>>>>>
>>>>> *1000000000*
>>>>>
>>>>>
>>>>> *julia> **@time x=zeros(Int64,N)*
>>>>>
>>>>> elapsed time: 6.160623835 seconds (8000014320 bytes allocated, 0.22%
>>>>> gc time)
>>>>>
>>>>>
>>>>>
>>>>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith <[email protected]>
>>>>>> wrote:
>>>>>> > To add some data to this conversation, I just timed allocating a
>>>>>> billion
>>>>>> > Int64s on my macbook, and I got this (I ran these multiple times
>>>>>> before this
>>>>>> > and got similar timings):
>>>>>> >
>>>>>> > julia> N=1_000_000_000
>>>>>> > 1000000000
>>>>>> >
>>>>>> > julia> @time x = Array(Int64,N);
>>>>>> > elapsed time: 0.022577671 seconds (8000000128 bytes allocated)
>>>>>> >
>>>>>> > julia> @time x = zeros(Int64,N);
>>>>>> > elapsed time: 3.95432248 seconds (8000000152 bytes allocated)
>>>>>> >
>>>>>> > So we are talking adding possibly seconds to a program per large
>>>>>> array
>>>>>> > allocation.
>>>>>>
>>>>>> This is not quite right -- the first does not actually map the pages
>>>>>> into memory; this is only done lazily when they are accessed the first
>>>>>> time. You need to compare "alloc uninitialized; then initialize once"
>>>>>> with "alloc zero-initialized; then initialize again".
>>>>>>
>>>>>> Current high-end system architectures have memory write speeds of ten
>>>>>> or twenty GByte per second; this is what you should see for very large
>>>>>> arrays -- this would be about 0.4 seconds for your case. For smaller
>>>>>> arrays, the data would reside in the cache, so that the allocation
>>>>>> overhead should be significantly smaller even.
>>>>>>
>>>>>> -erik
>>>>>>
>>>>>> --
>>>>>>
>>>>> Erik Schnetter <[email protected]>
>>>>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>>>>
>>>>>

Reply via email to