hey guys,
so attached you find text files with @code_native output for the 
instructions 
- r * x[1,:]
- cis(imexp)
- sum(imexp) * sum(conj(imexp))

for julia 0.5. 

Hardware I run on is a Haswell i5 machine, a Haswell i7 machine, and a 
IvyBridge i5 machine. Turned out on an Haswell i5 machine the code also 
runs fast. Only the Haswell i7 machine is the slow one. This really drove 
me nuts. First I thought it was the OS, then the architecture, and now its 
just from i5 to i7.... Anyways, I don't know anything about x86 assembly, 
but the julia 0.45 code is the same on all machines. However, for the dot 
product, the 0.5 code has already 2 different instructions on the i5 vs. 
the i7 (line 44&47). For the cis call also (line 149...). And the IvyBridge 
i5 code is similar to the Haswell i5. I included also versioninfo() at the 
top of the file. So you could just look at a vimdiff of the julia0.5 
files... Can anyone make sense out of this?

The binary tarballs I will still test. If I remove the cis() call, the 
difference is hard to tell, the loop is ~10times faster and more or less 
all around 5ms. For the whole loop with cis() call, from i5 to i7 the 
difference is ~ 50ms on i5 to 90ms on i7.

Shall I also post the julia 0.4 code?

cheers, Johannes



On Thursday, March 31, 2016 at 10:27:11 AM UTC+2, Milan Bouchet-Valat wrote:
>
> Le mercredi 30 mars 2016 à 15:16 -0700, Johannes Wagner a écrit : 
> > 
> > 
> > > Le mercredi 30 mars 2016 à 04:43 -0700, Johannes Wagner a écrit :  
> > > > Sorry for not having expressed myself clearly, I meant the latest  
> > > > version of fedora to work fine (24 development). I always used the  
> > > > latest julia nightly available on the copr nalimilan repo. Right 
> now  
> > > > that is: 0.5.0-dev+3292, Commit 9d527c5*, all use  
> > > > LLVM: libLLVM-3.7.1 (ORCJIT, haswell)  
> > > >  
> > > > peakflops on all machines (hardware identical) is ~1.2..1.5e11.    
> > > >  
> > > > Fedora 22&23 with julia 0.5 is ~50% slower then 0.4, only on fedora  
> > > > 24 julia 0.5 is  faster compared to julia 0.4.  
> > > Could you try to find a simple code to reproduce the problem? In  
> > > particular, it would be useful to check whether this comes from  
> > > OpenBLAS differences or whether it also happens with pure Julia code  
> > > (typical operations which depend on BLAS are matrix multiplication, 
> as  
> > > well as most of linear algebra). Normally, 0.4 and 0.5 should use the  
> > > same BLAS, but who knows...  
> > well thats what I did, and the 3 simple calls inside the loop are 
> > more or less same speed. only the whole loop seems slower. See my 
> > code sample fromanswer march 8th (code gets in same proportions 
> > faster when exp(im .* dotprods) is replaced by cis(dotprods) ).  
> > So I don't know what I can do then...   
> Sorry, somehow I had missed that message. This indeed looks like a code 
> generation issue in Julia/LLVM. 
>
> > > Can you also confirm that all versioninfo() fields are the same for 
> all  
> > > three machines, both for 0.4 and 0.5? We must envision the 
> possibility  
> > > that the differences actually come from 0.4.  
> > ohoh, right! just noticed that my fedora 24 machine was an ivy bridge 
> > which works fast: 
> > 
> > Julia Version 0.5.0-dev+3292 
> > Commit 9d527c5* (2016-03-28 06:55 UTC) 
> > Platform Info: 
> >   System: Linux (x86_64-redhat-linux) 
> >   CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz 
> >   WORD_SIZE: 64 
> >   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge) 
> >   LAPACK: libopenblasp.so.0 
> >   LIBM: libopenlibm 
> >   LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge) 
> > 
> > and the other ones with fed22/23 are haswell, which work slow: 
> > 
> > Julia Version 0.5.0-dev+3292 
> > Commit 9d527c5* (2016-03-28 06:55 UTC) 
> > Platform Info: 
> >   System: Linux (x86_64-redhat-linux) 
> >   CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 
> >   WORD_SIZE: 64 
> >   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell) 
> >   LAPACK: libopenblasp.so.0 
> >   LIBM: libopenlibm 
> >   LLVM: libLLVM-3.7.1 (ORCJIT, haswell) 
> > 
> > I just booted an fedora 23 on the ivy bridge machine and it's also 
> fast.  
> >   
> > Now if I use julia 0.45 on both architectures: 
> > 
> > Julia Version 0.4.5 
> > Commit 2ac304d* (2016-03-18 00:58 UTC) 
> > Platform Info: 
> >   System: Linux (x86_64-redhat-linux) 
> >   CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 
> >   WORD_SIZE: 64 
> >   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell) 
> >   LAPACK: libopenblasp.so.0 
> >   LIBM: libopenlibm 
> >   LLVM: libLLVM-3.3 
> > 
> > and: 
> > 
> > Julia Version 0.4.5 
> > Commit 2ac304d* (2016-03-18 00:58 UTC) 
> > Platform Info: 
> >   System: Linux (x86_64-redhat-linux) 
> >   CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz 
> >   WORD_SIZE: 64 
> >   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge) 
> >   LAPACK: libopenblasp.so.0 
> >   LIBM: libopenlibm 
> >   LLVM: libLLVM-3.3 
> > 
> > there is no speed difference apart from the ~10% or so from the 
> > faster haswell machine. So could perhaps be haswell hardware target 
> > specific with the change from llvm 3.3 to 3.7.1? Is there anything 
> > else I could provide? 
> This is certainly an interesting finding. Could you paste somewhere the 
> output of @code_native for your function on Sandybridge vs. Haswell, 
> for both 0.4 and 0.5? 
>
> It would also be useful to check whether the same difference appears if 
> you use the generic binary tarballs from http://julialang.org/downloads 
> . 
>
> Finally, do you get the same result if you remove the call to exp() 
> from the loop? (This is the only external function, so it shouldn't be 
> affected by changes in Julia.) 
>
>
> Regards 
>
>
> > Best, Johannes 
> > 
> > >  Regards  
> > 
> > 
> > > > Le mercredi 16 mars 2016 à 09:25 -0700, Johannes Wagner a écrit :   
> > > > > just a little update. Tested some other fedoras: Fedora 22 with 
> llvm   
> > > > > 3.8 is also slow with julia 0.5, whereas a fedora 24 branch with 
> llvm   
> > > > > 3.7 is faster on julia 0.5 compared to julia 0.4, as it should 
> be   
> > > > > (speedup from inner loop parts translated into speedup to whole   
> > > > > function).   
> > > > >   
> > > > > don't know if anyone cares about that... At least the latest 
> version   
> > > > > seems to work fine, hope it stays like this into the final fedora 
> 24   
> > > > What's the "latest version"? git built from source or RPM 
> nightlies?   
> > > > With which LLVM version for each?   
> > > >  
> > > > If from the RPMs, I've switched them to LLVM 3.8 for a few days, 
> and   
> > > > went back to 3.7 because of a build failure. So that might explain 
> the   
> > > > difference. You can install the last version which built with LLVM 
> 3.8   
> > > > manually from here:   
> > > > 
> https://copr-be.cloud.fedoraproject.org/results/nalimilan/julia-nightlies/fedora-23-x86_64/00167549-julia/
>    
>
> > > >  
> > > > It would be interesting to compare it with the latest nightly with 
> 3.7.   
> > > >  
> > > >  
> > > > Regards   
> > > >  
> > > >  
> > > >  
> > > > > > hey guys,   
> > > > > > I just experienced something weird. I have some code that runs 
> fine   
> > > > > > on 0.43, then I updated to 0.5dev to test the new Arrays, run 
> same   
> > > > > > code and noticed it got about ~50% slower. Then I downgraded 
> back   
> > > > > > to 0.43, ran the old code, but speed remained slow. I noticed 
> while   
> > > > > > reinstalling 0.43, openblas-threads didn't get isntalled along 
> with   
> > > > > > it. So I manually installed it, but no change.    
> > > > > > Does anyone has an idea what could be going on? LLVM on fedora23 
> is   
> > > > > > 3.7   
> > > > > >   
> > > > > > Cheers, Johannes   
> > > > > >   
>
julia> versioninfo()
Julia Version 0.5.0-dev+3372
Commit 7f177aa* (2016-04-02 12:18 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblasp.so.0
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)



julia> @code_native r * x[1,:]
        .text
Filename: matmul.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r12
        pushq   %rbx
        subq    $64, %rsp
        movq    %rsi, %r14
        movq    %rdi, %rbx
        movq    $0, -72(%rbp)
        movq    $0, -64(%rbp)
        movq    $0, -56(%rbp)
        movq    $0, -48(%rbp)
        movq    $0, -40(%rbp)
        movq    $10, -88(%rbp)
        movabsq $jl_tls_states, %r15
        movq    (%r15), %rax
        movq    %rax, -80(%rbp)
        leaq    -88(%rbp), %rax
        movq    %rax, (%r15)
        movq    24(%rbx), %r12
Source line: 196
        movabsq $jl_gc_alloc_1w, %rax
        callq   *%rax
        movabsq $140507545133280, %rdi  # imm = 0x7FCA7650E0E0
        movq    %rdi, -8(%rax)
        movq    %r12, (%rax)
        movq    %rax, -72(%rbp)
        addq    $2353776, %rdi          # imm = 0x23EA70
        movabsq $jl_new_array, %rcx
        movq    %rax, %rsi
        callq   *%rcx
        movq    %rax, -64(%rbp)
Source line: 88
        movq    %rbx, -56(%rbp)
        movq    %r14, -48(%rbp)
        movabsq $"gemv!", %r8
        movl    $78, %esi
        movq    %rax, %rdi
        movq    %rbx, %rdx
        movq    %r14, %rcx
        callq   *%r8
        movq    %rax, -40(%rbp)
        movq    -80(%rbp), %rcx
        movq    %rcx, (%r15)
        addq    $64, %rsp
        popq    %rbx
        popq    %r12
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
        nopw    (%rax,%rax)



julia> @code_native cis(dotprods)
        .text
Filename: operators.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        subq    $120, %rsp
        movq    %rdi, %r15
        xorl    %r14d, %r14d
        movq    $0, -104(%rbp)
        movq    $0, -96(%rbp)
        movq    $0, -88(%rbp)
        movq    $0, -80(%rbp)
        movq    $0, -72(%rbp)
        movq    $0, -64(%rbp)
        movq    $0, -56(%rbp)
        movq    $0, -48(%rbp)
        movq    $16, -120(%rbp)
        movabsq $jl_tls_states, %rcx
        movq    (%rcx), %rax
        movq    %rax, -112(%rbp)
        leaq    -120(%rbp), %rax
        movq    %rax, (%rcx)
Source line: 476
        movq    8(%r15), %rax
Source line: 83
        cmpq    $0, %rax
        cmovgq  %rax, %r14
        decq    %r14
        jo      L610
        incq    %r14
        jo      L635
        leaq    -80(%rbp), %r12
        leaq    -56(%rbp), %r13
        movabsq $140454161630160, %rbx  # imm = 0x7FBE086943D0
Source line: 303
        movq    %rbx, -56(%rbp)
        movabsq $jl_box_int64, %rax
        movq    %r14, %rdi
        callq   *%rax
        movq    %rax, -48(%rbp)
        leaq    32248(%rbx), %rdi
        movabsq $140462779530752, %rax  # imm = 0x7FC00A13FE00
        movl    $2, %edx
        movq    %r13, %rsi
        callq   *%rax
        movq    %rax, -104(%rbp)
        leaq    31823512(%rbx), %rcx
        movq    %rcx, -80(%rbp)
        movq    %rbx, -72(%rbp)
        movq    %rax, -64(%rbp)
        movabsq $jl_apply_generic, %rax
        movl    $3, %esi
        movq    %r12, %rdi
        movq    %rbx, %r12
        callq   *%rax
        movabsq $jl_alloc_array_1d, %rcx
        movq    %rax, -96(%rbp)
        movq    (%rax), %rsi
        leaq    17535392(%r12), %rdi
        callq   *%rcx
        movq    %rax, -152(%rbp)
        movq    %rax, -88(%rbp)
        cmpq    $0, %r14
        je      L484
        xorl    %r13d, %r13d
        xorl    %ebx, %ebx
        nopl    (%rax)
L320:
        cmpq    8(%r15), %rbx
        jae     L523
        movq    (%r15), %rax
        movsd   (%rax,%rbx,8), %xmm0    # xmm0 = mem[0],zero
Source line: 320
        movsd   %xmm0, -128(%rbp)
        leaq    -272840624(%r12), %rax
        callq   *%rax
        movsd   -128(%rbp), %xmm1       # xmm1 = mem[0],zero
        movsd   %xmm0, -136(%rbp)
        ucomisd %xmm1, %xmm1
        setp    %al
        ucomisd %xmm0, %xmm0
        setnp   %cl
        orb     %al, %cl
        testb   $1, %cl
        je      L560
        ucomisd %xmm1, %xmm1
        setp    -137(%rbp)
        leaq    -272820608(%r12), %rax
        movapd  %xmm1, %xmm0
        callq   *%rax
        ucomisd %xmm0, %xmm0
        setnp   %al
        orb     -137(%rbp), %al
        testb   $1, %al
        je      L585
Source line: 303
        incq    %rbx
Source line: 4
        movq    -152(%rbp), %rax
        movq    (%rax), %rax
        movsd   %xmm0, 8(%rax,%r13)
        movsd   -136(%rbp), %xmm0       # xmm0 = mem[0],zero
        movsd   %xmm0, (%rax,%r13)
Source line: 303
        addq    $16, %r13
        cmpq    %rbx, %r14
        jne     L320
Source line: 4
L484:
        movq    -112(%rbp), %rax
        movabsq $jl_tls_states, %rcx
        movq    %rax, (%rcx)
        movq    -152(%rbp), %rax
        leaq    -40(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
Source line: 303
L523:
        movq    %rsp, %rsi
        addq    $-16, %rsi
        movq    %rsi, %rsp
        addq    $1, %rbx
        movq    %rbx, (%rsi)
        movabsq $jl_bounds_error_ints, %rax
        movl    $1, %edx
        movq    %r15, %rdi
        callq   *%rax
Source line: 320
L560:
        movabsq $jl_domain_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
L585:
        movabsq $jl_domain_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
Source line: 83
L610:
        movabsq $jl_overflow_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
L635:
        movabsq $jl_overflow_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
        nopw    %cs:(%rax,%rax)



julia> @code_native sum(imexp) * sum(conj(imexp))
        .text
Filename: complex.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 124
        movsd   (%rsi), %xmm0           # xmm0 = mem[0],zero
        movsd   8(%rsi), %xmm1          # xmm1 = mem[0],zero
        movsd   (%rdx), %xmm2           # xmm2 = mem[0],zero
        movsd   8(%rdx), %xmm3          # xmm3 = mem[0],zero
        movapd  %xmm0, %xmm4
        mulsd   %xmm2, %xmm4
        movapd  %xmm1, %xmm5
        mulsd   %xmm3, %xmm5
        subsd   %xmm5, %xmm4
        mulsd   %xmm3, %xmm0
        mulsd   %xmm2, %xmm1
        addsd   %xmm0, %xmm1
        movsd   %xmm1, 8(%rdi)
        movsd   %xmm4, (%rdi)
        movq    %rdi, %rax
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)
julia> versioninfo()
Julia Version 0.5.0-dev+3372
Commit 7f177aa* (2016-04-02 12:18 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblasp.so.0
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)



julia> @code_native r * x[1,:]
        .text
Filename: matmul.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r12
        pushq   %rbx
        subq    $64, %rsp
        movq    %rsi, %r14
        movq    %rdi, %rbx
        movq    $0, -72(%rbp)
        movq    $0, -64(%rbp)
        movq    $0, -56(%rbp)
        movq    $0, -48(%rbp)
        movq    $0, -40(%rbp)
        movq    $10, -88(%rbp)
        movabsq $jl_tls_states, %r15
        movq    (%r15), %rax
        movq    %rax, -80(%rbp)
        leaq    -88(%rbp), %rax
        movq    %rax, (%r15)
        movq    24(%rbx), %r12
Source line: 196
        movabsq $jl_gc_alloc_1w, %rax
        callq   *%rax
        movabsq $140656226482736, %rdi  # imm = 0x7FED146A3A30
        leaq    661568(%rdi), %rcx
        movq    %rcx, -8(%rax)
        movq    %r12, (%rax)
        movq    %rax, -72(%rbp)
        movabsq $jl_new_array, %rcx
        movq    %rax, %rsi
        callq   *%rcx
        movq    %rax, -64(%rbp)
Source line: 88
        movq    %rbx, -56(%rbp)
        movq    %r14, -48(%rbp)
        movabsq $"gemv!", %r8
        movl    $78, %esi
        movq    %rax, %rdi
        movq    %rbx, %rdx
        movq    %r14, %rcx
        callq   *%r8
        movq    %rax, -40(%rbp)
        movq    -80(%rbp), %rcx
        movq    %rcx, (%r15)
        addq    $64, %rsp
        popq    %rbx
        popq    %r12
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
        nopw    (%rax,%rax)



julia> @code_native cis(dotprods)
        .text
Filename: operators.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        subq    $136, %rsp
        xorl    %r14d, %r14d
        movq    $0, -112(%rbp)
        movq    $0, -104(%rbp)
        movq    $0, -96(%rbp)
        movq    $0, -88(%rbp)
        movq    $0, -80(%rbp)
        movq    $0, -72(%rbp)
        movq    $0, -64(%rbp)
        movq    $0, -56(%rbp)
        movq    $0, -48(%rbp)
        movq    $20, -136(%rbp)
        movabsq $jl_tls_states, %rcx
        movq    (%rcx), %rax
        movq    %rax, -128(%rbp)
        leaq    -136(%rbp), %rax
        movq    %rax, (%rcx)
Source line: 476
        movq    %rdi, -120(%rbp)
        movq    8(%rdi), %rax
Source line: 83
        cmpq    $0, %rax
        cmovgq  %rax, %r14
        decq    %r14
        jo      L650
        movq    %rdi, -168(%rbp)
        incq    %r14
        jo      L675
        leaq    -80(%rbp), %r15
        leaq    -56(%rbp), %r12
        movabsq $140073459172304, %rbx  # imm = 0x7F6564C6C3D0
Source line: 303
        movq    %rbx, -56(%rbp)
        movabsq $jl_box_int64, %rax
        movq    %r14, %rdi
        callq   *%rax
        movq    %rax, -48(%rbp)
        leaq    32552(%rbx), %rdi
        movabsq $140082077063248, %rax  # imm = 0x7F6766715850
        movl    $2, %edx
        movq    %r12, %rsi
        callq   *%rax
        movq    %rax, -112(%rbp)
        leaq    33473648(%rbx), %rcx
        movq    %rcx, -80(%rbp)
        movq    %rbx, -72(%rbp)
        movq    %rax, -64(%rbp)
        movabsq $jl_apply_generic, %rax
        movl    $3, %esi
        movq    %r15, %rdi
        movq    %rbx, %r12
        callq   *%rax
        movabsq $jl_alloc_array_1d, %rcx
        movq    %rax, -104(%rbp)
        movq    (%rax), %rsi
        leaq    10193440(%r12), %rdi
        callq   *%rcx
        movq    %rax, -160(%rbp)
        movq    %rax, -96(%rbp)
        cmpq    $0, %r14
        je      L527
        xorl    %r13d, %r13d
        xorl    %ebx, %ebx
        nopw    %cs:(%rax,%rax)
L352:
        movq    -168(%rbp), %rdi
        movq    %rdi, -88(%rbp)
        cmpq    8(%rdi), %rbx
        jae     L566
        movq    (%rdi), %rax
        movsd   (%rax,%rbx,8), %xmm0    # xmm0 = mem[0],zero
Source line: 320
        movsd   %xmm0, -144(%rbp)
        leaq    -60767328(%r12), %rax
        callq   *%rax
        movsd   -144(%rbp), %xmm1       # xmm1 = mem[0],zero
        movsd   %xmm0, -152(%rbp)
        ucomisd %xmm1, %xmm1
        setp    %al
        ucomisd %xmm0, %xmm0
        setnp   %cl
        orb     %al, %cl
        testb   $1, %cl
        je      L600
        ucomisd %xmm1, %xmm1
        setp    %r15b
        leaq    -60747440(%r12), %rax
        movapd  %xmm1, %xmm0
        callq   *%rax
        ucomisd %xmm0, %xmm0
        setnp   %al
        orb     %r15b, %al
        testb   $1, %al
        je      L625
Source line: 303
        incq    %rbx
Source line: 4
        movq    -160(%rbp), %rax
        movq    (%rax), %rax
        movsd   %xmm0, 8(%rax,%r13)
        movsd   -152(%rbp), %xmm0       # xmm0 = mem[0],zero
        movsd   %xmm0, (%rax,%r13)
Source line: 303
        addq    $16, %r13
        cmpq    %rbx, %r14
        jne     L352
Source line: 4
L527:
        movq    -128(%rbp), %rax
        movabsq $jl_tls_states, %rcx
        movq    %rax, (%rcx)
        movq    -160(%rbp), %rax
        leaq    -40(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
Source line: 303
L566:
        movq    %rsp, %rsi
        addq    $-16, %rsi
        movq    %rsi, %rsp
        addq    $1, %rbx
        movq    %rbx, (%rsi)
        movabsq $jl_bounds_error_ints, %rax
        movl    $1, %edx
        callq   *%rax
Source line: 320
L600:
        movabsq $jl_domain_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
L625:
        movabsq $jl_domain_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
Source line: 83
L650:
        movabsq $jl_overflow_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
L675:
        movabsq $jl_overflow_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
        nopl    (%rax)



julia> @code_native sum(imexp) * sum(conj(imexp))
        .text
Filename: complex.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 124
        movsd   (%rsi), %xmm0           # xmm0 = mem[0],zero
        movsd   8(%rsi), %xmm1          # xmm1 = mem[0],zero
        movsd   (%rdx), %xmm2           # xmm2 = mem[0],zero
        movsd   8(%rdx), %xmm3          # xmm3 = mem[0],zero
        movapd  %xmm0, %xmm4
        mulsd   %xmm2, %xmm4
        movapd  %xmm1, %xmm5
        mulsd   %xmm3, %xmm5
        subsd   %xmm5, %xmm4
        mulsd   %xmm3, %xmm0
        mulsd   %xmm2, %xmm1
        addsd   %xmm0, %xmm1
        movsd   %xmm1, 8(%rdi)
        movsd   %xmm4, (%rdi)
        movq    %rdi, %rax
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)
julia> versioninfo()
Julia Version 0.5.0-dev+3390
Commit a9e7e86* (2016-04-04 12:47 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblasp.so.0
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)



julia> @code_native r * x[1,:]
        .text
Filename: matmul.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r12
        pushq   %rbx
        subq    $48, %rsp
        movq    %rsi, %r14
        movq    %rdi, %r12
        movq    $0, -56(%rbp)
        movq    $0, -48(%rbp)
        movq    $0, -40(%rbp)
        movq    $6, -72(%rbp)
        movabsq $jl_tls_states, %r15
        movq    (%r15), %rax
        movq    %rax, -64(%rbp)
        leaq    -72(%rbp), %rax
        movq    %rax, (%r15)
        movq    24(%r12), %rbx
Source line: 196
        movabsq $jl_gc_alloc_1w, %rax
        callq   *%rax
        movabsq $140467867165008, %rdi  # imm = 0x7FC139532150
        movq    %rdi, -8(%rax)
        movq    %rbx, (%rax)
        movq    %rax, -56(%rbp)
        addq    $2290240, %rdi          # imm = 0x22F240
        movabsq $jl_new_array, %rcx
        movq    %rax, %rsi
        callq   *%rcx
        movq    %rax, -48(%rbp)
Source line: 88
        movabsq $"gemv!", %rbx
        movl    $78, %esi
        movq    %rax, %rdi
        movq    %r12, %rdx
        movq    %r14, %rcx
        callq   *%rbx
        movq    %rax, -40(%rbp)
        movq    -64(%rbp), %rcx
        movq    %rcx, (%r15)
        addq    $48, %rsp
        popq    %rbx
        popq    %r12
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
        nop



julia> @code_native cis(dotprods)
        .text
Filename: operators.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        subq    $120, %rsp
        movq    %rdi, %r15
        xorl    %r14d, %r14d
        movq    $0, -104(%rbp)
        movq    $0, -96(%rbp)
        movq    $0, -88(%rbp)
        movq    $0, -80(%rbp)
        movq    $0, -72(%rbp)
        movq    $0, -64(%rbp)
        movq    $0, -56(%rbp)
        movq    $0, -48(%rbp)
        movq    $16, -120(%rbp)
        movabsq $jl_tls_states, %rcx
        movq    (%rcx), %rax
        movq    %rax, -112(%rbp)
        leaq    -120(%rbp), %rax
        movq    %rax, (%rcx)
Source line: 476
        movq    8(%r15), %rax
Source line: 83
        cmpq    $0, %rax
        cmovgq  %rax, %r14
        decq    %r14
        jo      L610
        incq    %r14
        jo      L635
        leaq    -80(%rbp), %r12
        leaq    -56(%rbp), %r13
        movabsq $140543751914448, %rbx  # imm = 0x7FD2E46883D0
Source line: 303
        movq    %rbx, -56(%rbp)
        movabsq $jl_box_int64, %rax
        movq    %r14, %rdi
        callq   *%rax
        movq    %rax, -48(%rbp)
        leaq    32528(%rbx), %rdi
        movabsq $140552369804176, %rax  # imm = 0x7FD4E6131390
        movl    $2, %edx
        movq    %r13, %rsi
        callq   *%rax
        movq    %rax, -104(%rbp)
        leaq    26527528(%rbx), %rcx
        movq    %rcx, -80(%rbp)
        movq    %rbx, -72(%rbp)
        movq    %rax, -64(%rbp)
        movabsq $jl_apply_generic, %rax
        movl    $3, %esi
        movq    %r12, %rdi
        movq    %rbx, %r12
        callq   *%rax
        movabsq $jl_alloc_array_1d, %rcx
        movq    %rax, -96(%rbp)
        movq    (%rax), %rsi
        leaq    2314368(%r12), %rdi
        callq   *%rcx
        movq    %rax, -152(%rbp)
        movq    %rax, -88(%rbp)
        cmpq    $0, %r14
        je      L484
        xorl    %r13d, %r13d
        xorl    %ebx, %ebx
        nopl    (%rax)
L320:
        cmpq    8(%r15), %rbx
        jae     L523
        movq    (%r15), %rax
        movsd   (%rax,%rbx,8), %xmm0    # xmm0 = mem[0],zero
Source line: 320
        movsd   %xmm0, -128(%rbp)
        leaq    -335968048(%r12), %rax
        callq   *%rax
        movsd   -128(%rbp), %xmm1       # xmm1 = mem[0],zero
        movsd   %xmm0, -136(%rbp)
        ucomisd %xmm1, %xmm1
        setp    %al
        ucomisd %xmm0, %xmm0
        setnp   %cl
        orb     %al, %cl
        testb   $1, %cl
        je      L560
        ucomisd %xmm1, %xmm1
        setp    -137(%rbp)
        leaq    -335948544(%r12), %rax
        movapd  %xmm1, %xmm0
        callq   *%rax
        ucomisd %xmm0, %xmm0
        setnp   %al
        orb     -137(%rbp), %al
        testb   $1, %al
        je      L585
Source line: 303
        incq    %rbx
Source line: 4
        movq    -152(%rbp), %rax
        movq    (%rax), %rax
        movsd   %xmm0, 8(%rax,%r13)
        movsd   -136(%rbp), %xmm0       # xmm0 = mem[0],zero
        movsd   %xmm0, (%rax,%r13)
Source line: 303
        addq    $16, %r13
        cmpq    %rbx, %r14
        jne     L320
Source line: 4
L484:
        movq    -112(%rbp), %rax
        movabsq $jl_tls_states, %rcx
        movq    %rax, (%rcx)
        movq    -152(%rbp), %rax
        leaq    -40(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
Source line: 303
L523:
        movq    %rsp, %rsi
        addq    $-16, %rsi
        movq    %rsi, %rsp
        addq    $1, %rbx
        movq    %rbx, (%rsi)
        movabsq $jl_bounds_error_ints, %rax
        movl    $1, %edx
        movq    %r15, %rdi
        callq   *%rax
Source line: 320
L560:
        movabsq $jl_domain_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
L585:
        movabsq $jl_domain_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
Source line: 83
L610:
        movabsq $jl_overflow_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
L635:
        movabsq $jl_overflow_exception, %rax
        movq    (%rax), %rdi
        movabsq $jl_throw, %rax
        callq   *%rax
        nopw    %cs:(%rax,%rax)



julia> @code_native sum(imexp) * sum(conj(imexp))
        .text
Filename: complex.jl
Source line: 0
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 124
        movsd   (%rsi), %xmm0           # xmm0 = mem[0],zero
        movsd   8(%rsi), %xmm1          # xmm1 = mem[0],zero
        movsd   (%rdx), %xmm2           # xmm2 = mem[0],zero
        movsd   8(%rdx), %xmm3          # xmm3 = mem[0],zero
        movapd  %xmm0, %xmm4
        mulsd   %xmm2, %xmm4
        movapd  %xmm1, %xmm5
        mulsd   %xmm3, %xmm5
        subsd   %xmm5, %xmm4
        mulsd   %xmm3, %xmm0
        mulsd   %xmm2, %xmm1
        addsd   %xmm0, %xmm1
        movsd   %xmm1, 8(%rdi)
        movsd   %xmm4, (%rdi)
        movq    %rdi, %rax
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)

Reply via email to