You are still assuming that this function could be significantly faster.
I've pulled out just the interesting bit for readability, but the code is
the same in the larger function.
The assembly is already very efficient – and it would be much shorter if I
put the `@inbounds` back:
julia> function immut_test_core(a::Array{xyimmut, 1},i)
a[i] = xyimmut(a[i].x, a[i].y, a[i].z, true)
end
immut_test_core (generic function with 2 methods)
julia> code_native(immut_test_core,(Array{xyimmut,1},Int))
.section __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 2
push RBP
mov RBP, RSP
Source line: 2
dec RSI
cmp RSI, QWORD PTR [RDI + 16]
jae 49
shl RSI, 5
mov RCX, QWORD PTR [RDI + 8]
mov RAX, QWORD PTR [RCX + RSI]
vmovsd XMM0, QWORD PTR [RCX + RSI + 8]
vmovsd XMM1, QWORD PTR [RCX + RSI + 16]
vmovsd QWORD PTR [RCX + RSI + 16], XMM1
vmovsd QWORD PTR [RCX + RSI + 8], XMM0
mov QWORD PTR [RCX + RSI], RAX
mov BYTE PTR [RCX + RSI + 24], 1
mov DL, 1
pop RBP
ret
movabs RAX, 4421569152
mov RDI, QWORD PTR [RAX]
movabs RAX, 4405884352
mov ESI, 2
call RAX
What is not clear to my why the llvm pass has left in those 4 vmovsd calls,
since it should have been able to eliminate those. Perhaps Jeff or Keno
know of some llvm pass that is getting missed / schedule wrong?
On Tue, Aug 5, 2014 at 6:32 PM, <[email protected]> wrote:
> Jameson,
>
> I disagree that my numbers contradict my claim. Here are my numbers
> presented more succinctly:
>
> type immutable
> no update 0.14 0.04
> fast update 0.62 not available
> slow update 4.51 0.35
>
> The difference I am complaining about is between 'fast update' and 'slow
> update' for 'immutable'. Since fast update is not available for
> 'immutable', we can only guess how much improvement is possible. However,
> judging from the improvement between slow and fast update for 'type', it
> seems likely to me that there would also be a noticeable difference for
> 'immutable.'
>
> As for llvm, I don't know what its properties and limitations are.
> However, it seems to me that a more intuitive way to deal with the issue I
> raise (compared to current Julia) is to deprecate the current usage of
> 'immutable' and instead split the two attributes into two keywords:
>
> type A : refcounted # like current 'type'
> end
> type A : immutable # like current 'immutable'
> end
> type A : refcounted, immutable # not currently available
> end
> type A # i.e., neither refcounted nor immutable; not currently available.
> end
>
>
>
>
> On Tuesday, August 5, 2014 5:38:17 PM UTC-4, [email protected] wrote:
>>
>> Dear Julia users,
>>
>> It seems to me that Julia's distinction between a 'type' and an
>> 'immutable' conflates two independent properties; the consequence of this
>> conflation is a needless loss of performance. In more detail, the
>> differences between a 'type' struct and 'immutable' struct in Julia are:
>>
>> 1. Assignment of 'type' struct copies a pointer; assignment of an
>> 'immutable' struct copies the data.
>>
>> 2. An array of type structs is an array of pointers, while an array of
>> immutables is an array of data.
>>
>> 3. Type structs are refcounted, whereas immutables are not. (This is not
>> documented; it is my conjecture.)
>>
>> 4. Fields in type structs can be modified, but fields in immutables
>> cannot.
>>
>> Clearly #1-#3 are related concepts. As far as I can see, #4 is
>> completely independent from #1-#3, and there is no obvious reason why it is
>> forbidden to modify fields in immutables. There is no analogous
>> restriction in C/C++.
>>
>> This conflation causes a performance hit. Consider:
>>
>> type floatbool
>> a::Float64
>> b:Bool
>> end
>>
>> If t is of type Array{floatbool,1} and I want to update the flag b in
>> t[10] to 'true', I say 't[10].b=true' (call this 'fast'update). But if
>> instead of 'type floatbool' I had said 'immutable floatbool', then to set
>> flag b in t[10] I need the more complex code t[10] =
>> floatbool(t[10].a,true) (call this 'slow' update).
>>
>> To document the performance hit, I wrote five functions below. The first
>> three use 'type' and either no update, fast update, or slow update; the
>> last two use 'immutable' and either no update or slow update. You can see
>> a HUGE hit on performance between slow and fast update for `type'; for
>> immutable there would presumably also be a difference, although apparently
>> smaller. (Obviously, I can't test fast update for immutable; this is the
>> point of my message!)
>>
>> So why does Julia impose this apparently needless restriction on
>> immutable?
>>
>> -- Steve Vavasis
>>
>>
>> julia> @time testimmut.type_upd_none()
>> @time testimmut.type_upd_none()
>> elapsed time: 0.141462422 seconds (48445152 bytes allocated)
>>
>> julia> @time testimmut.type_upd_fast()
>> @time testimmut.type_upd_fast()
>> elapsed time: 0.618769232 seconds (48247072 bytes allocated)
>>
>> julia> @time testimmut.type_upd_slow()
>> @time testimmut.type_upd_slow()
>> elapsed time: 4.511306586 seconds (4048268640 bytes allocated)
>>
>> julia> @time testimmut.immut_upd_none()
>> @time testimmut.immut_upd_none()
>> elapsed time: 0.04480173 seconds (32229468 bytes allocated)
>>
>> julia> @time testimmut.immut_upd_slow()
>> @time testimmut.immut_upd_slow()
>> elapsed time: 0.351634871 seconds (32000096 bytes allocated)
>>
>> module testimmut
>>
>> type xytype
>> x::Int
>> y::Float64
>> z::Float64
>> summed::Bool
>> end
>>
>> immutable xyimmut
>> x::Int
>> y::Float64
>> z::Float64
>> summed::Bool
>> end
>>
>> myfun(x) = x * (x + 1) * (x + 2)
>>
>> function type_upd_none()
>> n = 1000000
>> a = Array(xytype, n)
>> for i = 1 : n
>> a[i] = xytype(div(i,2), 0.0, 0.0, false)
>> end
>> numtri = 100
>> for tri = 1 : numtri
>> sum = 0
>> for i = 1 : n
>> @inbounds x = a[i].x
>> sum += myfun(x)
>> end
>> end
>> end
>>
>>
>> function type_upd_fast()
>> n = 1000000
>> a = Array(xytype, n)
>> for i = 1 : n
>> a[i] = xytype(div(i,2), 0.0, 0.0, false)
>> end
>> numtri = 100
>> for tri = 1 : numtri
>> sum = 0
>> for i = 1 : n
>> @inbounds x = a[i].x
>> sum += myfun(x)
>> @inbounds a[i].summed = true
>> end
>> end
>> end
>>
>> function type_upd_slow()
>> n = 1000000
>> a = Array(xytype, n)
>> for i = 1 : n
>> a[i] = xytype(div(i,2), 0.0, 0.0, false)
>> end
>> numtri = 100
>> for tri = 1 : numtri
>> sum = 0
>> for i = 1 : n
>> @inbounds x = a[i].x
>> sum += myfun(x)
>> @inbounds a[i] = xytype(a[i].x, a[i].y, a[i].z, true)
>> end
>> end
>> end
>>
>>
>> function immut_upd_none()
>> n = 1000000
>> a = Array(xyimmut, n)
>> for i = 1 : n
>> a[i] = xyimmut(div(i,2), 0.0, 0.0, false)
>> end
>> numtri = 100
>> for tri = 1 : numtri
>> sum = 0
>> for i = 1 : n
>> @inbounds x = a[i].x
>> sum += myfun(x)
>> end
>> end
>> end
>>
>> function immut_upd_slow()
>> n = 1000000
>> a = Array(xyimmut, n)
>> for i = 1 : n
>> a[i] = xyimmut(div(i,2), 0.0, 0.0, false)
>> end
>> numtri = 100
>> for tri = 1 : numtri
>> sum = 0
>> for i = 1 : n
>> @inbounds x = a[i].x
>> sum += myfun(x)
>> @inbounds a[i] = xyimmut(a[i].x, a[i].y, a[i].z, true)
>> end
>> end
>> end
>>
>> end
>>
>>
>>
>>
>