I confirm that I can't get Julia to synthesize a `vfmadd` instruction either... Sorry for sending you on a wild goose chase.
-erik On Wed, Sep 21, 2016 at 9:33 PM, Yichao Yu <yyc1...@gmail.com> wrote: > On Wed, Sep 21, 2016 at 9:29 PM, Erik Schnetter <schnet...@gmail.com> > wrote: > > On Wed, Sep 21, 2016 at 9:22 PM, Chris Rackauckas <rackd...@gmail.com> > > wrote: > >> > >> I'm not seeing `@fastmath` apply fma/muladd. I rebuilt the sysimg and > now > >> I get results where g and h apply muladd/fma in the native code, but a > new > >> function k which is `@fastmath` inside of f does not apply muladd/fma. > >> > >> https://gist.github.com/ChrisRackauckas/b239e33b4b52bcc28f3922c673a259 > 10 > >> > >> Should I open an issue? > > > > > > In your case, LLVM apparently thinks that `x + x + 3` is faster to > calculate > > than `2x+3`. If you use a less round number than `2` multiplying `x`, you > > might see a different behaviour. > > I've personally never seen llvm create fma from mul and add. We might > not have the llvm passes enabled if LLVM is capable of doing this at > all. > > > > > -erik > > > > > >> Note that this is on v0.6 Windows. On Linux the sysimg isn't rebuilding > >> for some reason, so I may need to just build from source. > >> > >> On Wednesday, September 21, 2016 at 6:22:06 AM UTC-7, Erik Schnetter > >> wrote: > >>> > >>> On Wed, Sep 21, 2016 at 1:56 AM, Chris Rackauckas <rack...@gmail.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> First of all, does LLVM essentially fma or muladd expressions like > >>>> `a1*x1 + a2*x2 + a3*x3 + a4*x4`? Or is it required that one > explicitly use > >>>> `muladd` and `fma` on these types of instructions (is there a macro > for > >>>> making this easier)? > >>> > >>> > >>> Yes, LLVM will use fma machine instructions -- but only if they lead to > >>> the same round-off error as using separate multiply and add > instructions. If > >>> you do not care about the details of conforming to the IEEE standard, > then > >>> you can use the `@fastmath` macro that enables several optimizations, > >>> including this one. This is described in the manual > >>> <http://docs.julialang.org/en/release-0.5/manual/ > performance-tips/#performance-annotations>. > >>> > >>> > >>>> Secondly, I am wondering if my setup is no applying these operations > >>>> correctly. Here's my test code: > >>>> > >>>> f(x) = 2.0x + 3.0 > >>>> g(x) = muladd(x,2.0, 3.0) > >>>> h(x) = fma(x,2.0, 3.0) > >>>> > >>>> @code_llvm f(4.0) > >>>> @code_llvm g(4.0) > >>>> @code_llvm h(4.0) > >>>> > >>>> @code_native f(4.0) > >>>> @code_native g(4.0) > >>>> @code_native h(4.0) > >>>> > >>>> Computer 1 > >>>> > >>>> Julia Version 0.5.0-rc4+0 > >>>> Commit 9c76c3e* (2016-09-09 01:43 UTC) > >>>> Platform Info: > >>>> System: Linux (x86_64-redhat-linux) > >>>> CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > >>>> WORD_SIZE: 64 > >>>> BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell) > >>>> LAPACK: libopenblasp.so.0 > >>>> LIBM: libopenlibm > >>>> LLVM: libLLVM-3.7.1 (ORCJIT, broadwell) > >>> > >>> > >>> This looks good, the "broadwell" architecture that LLVM uses should > imply > >>> the respective optimizations. Try with `@fastmath`. > >>> > >>> -erik > >>> > >>> > >>> > >>> > >>>> > >>>> (the COPR nightly on CentOS7) with > >>>> > >>>> [crackauc@crackauc2 ~]$ lscpu > >>>> Architecture: x86_64 > >>>> CPU op-mode(s): 32-bit, 64-bit > >>>> Byte Order: Little Endian > >>>> CPU(s): 16 > >>>> On-line CPU(s) list: 0-15 > >>>> Thread(s) per core: 1 > >>>> Core(s) per socket: 8 > >>>> Socket(s): 2 > >>>> NUMA node(s): 2 > >>>> Vendor ID: GenuineIntel > >>>> CPU family: 6 > >>>> Model: 79 > >>>> Model name: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > >>>> Stepping: 1 > >>>> CPU MHz: 1200.000 > >>>> BogoMIPS: 6392.58 > >>>> Virtualization: VT-x > >>>> L1d cache: 32K > >>>> L1i cache: 32K > >>>> L2 cache: 256K > >>>> L3 cache: 25600K > >>>> NUMA node0 CPU(s): 0-7 > >>>> NUMA node1 CPU(s): 8-15 > >>>> > >>>> > >>>> > >>>> I get the output > >>>> > >>>> define double @julia_f_72025(double) #0 { > >>>> top: > >>>> %1 = fmul double %0, 2.000000e+00 > >>>> %2 = fadd double %1, 3.000000e+00 > >>>> ret double %2 > >>>> } > >>>> > >>>> define double @julia_g_72027(double) #0 { > >>>> top: > >>>> %1 = call double @llvm.fmuladd.f64(double %0, double 2.000000e+00, > >>>> double 3.000000e+00) > >>>> ret double %1 > >>>> } > >>>> > >>>> define double @julia_h_72029(double) #0 { > >>>> top: > >>>> %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, > double > >>>> 3.000000e+00) > >>>> ret double %1 > >>>> } > >>>> .text > >>>> Filename: fmatest.jl > >>>> pushq %rbp > >>>> movq %rsp, %rbp > >>>> Source line: 1 > >>>> addsd %xmm0, %xmm0 > >>>> movabsq $139916162906520, %rax # imm = 0x7F40C5303998 > >>>> addsd (%rax), %xmm0 > >>>> popq %rbp > >>>> retq > >>>> nopl (%rax,%rax) > >>>> .text > >>>> Filename: fmatest.jl > >>>> pushq %rbp > >>>> movq %rsp, %rbp > >>>> Source line: 2 > >>>> addsd %xmm0, %xmm0 > >>>> movabsq $139916162906648, %rax # imm = 0x7F40C5303A18 > >>>> addsd (%rax), %xmm0 > >>>> popq %rbp > >>>> retq > >>>> nopl (%rax,%rax) > >>>> .text > >>>> Filename: fmatest.jl > >>>> pushq %rbp > >>>> movq %rsp, %rbp > >>>> movabsq $139916162906776, %rax # imm = 0x7F40C5303A98 > >>>> Source line: 3 > >>>> movsd (%rax), %xmm1 # xmm1 = mem[0],zero > >>>> movabsq $139916162906784, %rax # imm = 0x7F40C5303AA0 > >>>> movsd (%rax), %xmm2 # xmm2 = mem[0],zero > >>>> movabsq $139925776008800, %rax # imm = 0x7F43022C8660 > >>>> popq %rbp > >>>> jmpq *%rax > >>>> nopl (%rax) > >>>> > >>>> It looks like explicit muladd or not ends up at the same native code, > >>>> but is that native code actually doing an fma? The fma native is > different, > >>>> but from a discussion on the Gitter it seems that might be a software > FMA? > >>>> This computer is setup with the BIOS setting as LAPACK optimized or > >>>> something like that, so is that messing with something? > >>>> > >>>> Computer 2 > >>>> > >>>> Julia Version 0.6.0-dev.557 > >>>> Commit c7a4897 (2016-09-08 17:50 UTC) > >>>> Platform Info: > >>>> System: NT (x86_64-w64-mingw32) > >>>> CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz > >>>> WORD_SIZE: 64 > >>>> BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) > >>>> LAPACK: libopenblas64_ > >>>> LIBM: libopenlibm > >>>> LLVM: libLLVM-3.7.1 (ORCJIT, haswell) > >>>> > >>>> > >>>> on a 4770k i7, Windows 10, I get the output > >>>> > >>>> ; Function Attrs: uwtable > >>>> define double @julia_f_66153(double) #0 { > >>>> top: > >>>> %1 = fmul double %0, 2.000000e+00 > >>>> %2 = fadd double %1, 3.000000e+00 > >>>> ret double %2 > >>>> } > >>>> > >>>> ; Function Attrs: uwtable > >>>> define double @julia_g_66157(double) #0 { > >>>> top: > >>>> %1 = call double @llvm.fmuladd.f64(double %0, double 2.000000e+00, > >>>> double 3.000000e+00) > >>>> ret double %1 > >>>> } > >>>> > >>>> ; Function Attrs: uwtable > >>>> define double @julia_h_66158(double) #0 { > >>>> top: > >>>> %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, > double > >>>> 3.000000e+00) > >>>> ret double %1 > >>>> } > >>>> .text > >>>> Filename: console > >>>> pushq %rbp > >>>> movq %rsp, %rbp > >>>> Source line: 1 > >>>> addsd %xmm0, %xmm0 > >>>> movabsq $534749456, %rax # imm = 0x1FDFA110 > >>>> addsd (%rax), %xmm0 > >>>> popq %rbp > >>>> retq > >>>> nopl (%rax,%rax) > >>>> .text > >>>> Filename: console > >>>> pushq %rbp > >>>> movq %rsp, %rbp > >>>> Source line: 2 > >>>> addsd %xmm0, %xmm0 > >>>> movabsq $534749584, %rax # imm = 0x1FDFA190 > >>>> addsd (%rax), %xmm0 > >>>> popq %rbp > >>>> retq > >>>> nopl (%rax,%rax) > >>>> .text > >>>> Filename: console > >>>> pushq %rbp > >>>> movq %rsp, %rbp > >>>> movabsq $534749712, %rax # imm = 0x1FDFA210 > >>>> Source line: 3 > >>>> movsd dcabs164_(%rax), %xmm1 # xmm1 = mem[0],zero > >>>> movabsq $534749720, %rax # imm = 0x1FDFA218 > >>>> movsd (%rax), %xmm2 # xmm2 = mem[0],zero > >>>> movabsq $fma, %rax > >>>> popq %rbp > >>>> jmpq *%rax > >>>> nop > >>>> > >>>> This seems to be similar to the first result. > >>>> > >>> > >>> > >>> > >>> -- > >>> Erik Schnetter <schn...@gmail.com> > >>> http://www.perimeterinstitute.ca/personal/eschnetter/ > > > > > > > > > > -- > > Erik Schnetter <schnet...@gmail.com> > > http://www.perimeterinstitute.ca/personal/eschnetter/ > -- Erik Schnetter <schnet...@gmail.com> http://www.perimeterinstitute.ca/personal/eschnetter/