It should. Yes, please open an issue. -erik
On Thu, Sep 22, 2016 at 7:46 PM, Chris Rackauckas <rackd...@gmail.com> wrote: > So, in the end, is `@fastmath` supposed to be adding FMA? Should I open an > issue? > > On Wednesday, September 21, 2016 at 7:11:14 PM UTC-7, Yichao Yu wrote: >> >> On Wed, Sep 21, 2016 at 9:49 PM, Erik Schnetter <schn...@gmail.com> >> wrote: >> > I confirm that I can't get Julia to synthesize a `vfmadd` instruction >> > either... Sorry for sending you on a wild goose chase. >> >> -march=haswell does the trick for C (both clang and gcc) >> the necessary bit for the machine ir optimization (this is not a llvm >> ir optimization pass) to do this is llc options -mcpu=haswell and >> function attribute unsafe-fp-math=true. >> >> > >> > -erik >> > >> > On Wed, Sep 21, 2016 at 9:33 PM, Yichao Yu <yyc...@gmail.com> wrote: >> >> >> >> On Wed, Sep 21, 2016 at 9:29 PM, Erik Schnetter <schn...@gmail.com> >> >> wrote: >> >> > On Wed, Sep 21, 2016 at 9:22 PM, Chris Rackauckas <rack...@gmail.com> >> >> >> > wrote: >> >> >> >> >> >> I'm not seeing `@fastmath` apply fma/muladd. I rebuilt the sysimg >> and >> >> >> now >> >> >> I get results where g and h apply muladd/fma in the native code, >> but a >> >> >> new >> >> >> function k which is `@fastmath` inside of f does not apply >> muladd/fma. >> >> >> >> >> >> >> >> >> https://gist.github.com/ChrisRackauckas/b239e33b4b52bcc28f39 >> 22c673a25910 >> >> >> >> >> >> Should I open an issue? >> >> > >> >> > >> >> > In your case, LLVM apparently thinks that `x + x + 3` is faster to >> >> > calculate >> >> > than `2x+3`. If you use a less round number than `2` multiplying >> `x`, >> >> > you >> >> > might see a different behaviour. >> >> >> >> I've personally never seen llvm create fma from mul and add. We might >> >> not have the llvm passes enabled if LLVM is capable of doing this at >> >> all. >> >> >> >> > >> >> > -erik >> >> > >> >> > >> >> >> Note that this is on v0.6 Windows. On Linux the sysimg isn't >> rebuilding >> >> >> for some reason, so I may need to just build from source. >> >> >> >> >> >> On Wednesday, September 21, 2016 at 6:22:06 AM UTC-7, Erik >> Schnetter >> >> >> wrote: >> >> >>> >> >> >>> On Wed, Sep 21, 2016 at 1:56 AM, Chris Rackauckas < >> rack...@gmail.com> >> >> >>> wrote: >> >> >>>> >> >> >>>> Hi, >> >> >>>> First of all, does LLVM essentially fma or muladd expressions >> like >> >> >>>> `a1*x1 + a2*x2 + a3*x3 + a4*x4`? Or is it required that one >> >> >>>> explicitly use >> >> >>>> `muladd` and `fma` on these types of instructions (is there a >> macro >> >> >>>> for >> >> >>>> making this easier)? >> >> >>> >> >> >>> >> >> >>> Yes, LLVM will use fma machine instructions -- but only if they >> lead >> >> >>> to >> >> >>> the same round-off error as using separate multiply and add >> >> >>> instructions. If >> >> >>> you do not care about the details of conforming to the IEEE >> standard, >> >> >>> then >> >> >>> you can use the `@fastmath` macro that enables several >> optimizations, >> >> >>> including this one. This is described in the manual >> >> >>> >> >> >>> <http://docs.julialang.org/en/release-0.5/manual/performance >> -tips/#performance-annotations>. >> >> >>> >> >> >>> >> >> >>>> Secondly, I am wondering if my setup is no applying these >> >> >>>> operations >> >> >>>> correctly. Here's my test code: >> >> >>>> >> >> >>>> f(x) = 2.0x + 3.0 >> >> >>>> g(x) = muladd(x,2.0, 3.0) >> >> >>>> h(x) = fma(x,2.0, 3.0) >> >> >>>> >> >> >>>> @code_llvm f(4.0) >> >> >>>> @code_llvm g(4.0) >> >> >>>> @code_llvm h(4.0) >> >> >>>> >> >> >>>> @code_native f(4.0) >> >> >>>> @code_native g(4.0) >> >> >>>> @code_native h(4.0) >> >> >>>> >> >> >>>> Computer 1 >> >> >>>> >> >> >>>> Julia Version 0.5.0-rc4+0 >> >> >>>> Commit 9c76c3e* (2016-09-09 01:43 UTC) >> >> >>>> Platform Info: >> >> >>>> System: Linux (x86_64-redhat-linux) >> >> >>>> CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz >> >> >>>> WORD_SIZE: 64 >> >> >>>> BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell) >> >> >>>> LAPACK: libopenblasp.so.0 >> >> >>>> LIBM: libopenlibm >> >> >>>> LLVM: libLLVM-3.7.1 (ORCJIT, broadwell) >> >> >>> >> >> >>> >> >> >>> This looks good, the "broadwell" architecture that LLVM uses >> should >> >> >>> imply >> >> >>> the respective optimizations. Try with `@fastmath`. >> >> >>> >> >> >>> -erik >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>>> >> >> >>>> (the COPR nightly on CentOS7) with >> >> >>>> >> >> >>>> [crackauc@crackauc2 ~]$ lscpu >> >> >>>> Architecture: x86_64 >> >> >>>> CPU op-mode(s): 32-bit, 64-bit >> >> >>>> Byte Order: Little Endian >> >> >>>> CPU(s): 16 >> >> >>>> On-line CPU(s) list: 0-15 >> >> >>>> Thread(s) per core: 1 >> >> >>>> Core(s) per socket: 8 >> >> >>>> Socket(s): 2 >> >> >>>> NUMA node(s): 2 >> >> >>>> Vendor ID: GenuineIntel >> >> >>>> CPU family: 6 >> >> >>>> Model: 79 >> >> >>>> Model name: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz >> >> >>>> Stepping: 1 >> >> >>>> CPU MHz: 1200.000 >> >> >>>> BogoMIPS: 6392.58 >> >> >>>> Virtualization: VT-x >> >> >>>> L1d cache: 32K >> >> >>>> L1i cache: 32K >> >> >>>> L2 cache: 256K >> >> >>>> L3 cache: 25600K >> >> >>>> NUMA node0 CPU(s): 0-7 >> >> >>>> NUMA node1 CPU(s): 8-15 >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> I get the output >> >> >>>> >> >> >>>> define double @julia_f_72025(double) #0 { >> >> >>>> top: >> >> >>>> %1 = fmul double %0, 2.000000e+00 >> >> >>>> %2 = fadd double %1, 3.000000e+00 >> >> >>>> ret double %2 >> >> >>>> } >> >> >>>> >> >> >>>> define double @julia_g_72027(double) #0 { >> >> >>>> top: >> >> >>>> %1 = call double @llvm.fmuladd.f64(double %0, double >> 2.000000e+00, >> >> >>>> double 3.000000e+00) >> >> >>>> ret double %1 >> >> >>>> } >> >> >>>> >> >> >>>> define double @julia_h_72029(double) #0 { >> >> >>>> top: >> >> >>>> %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, >> >> >>>> double >> >> >>>> 3.000000e+00) >> >> >>>> ret double %1 >> >> >>>> } >> >> >>>> .text >> >> >>>> Filename: fmatest.jl >> >> >>>> pushq %rbp >> >> >>>> movq %rsp, %rbp >> >> >>>> Source line: 1 >> >> >>>> addsd %xmm0, %xmm0 >> >> >>>> movabsq $139916162906520, %rax # imm = 0x7F40C5303998 >> >> >>>> addsd (%rax), %xmm0 >> >> >>>> popq %rbp >> >> >>>> retq >> >> >>>> nopl (%rax,%rax) >> >> >>>> .text >> >> >>>> Filename: fmatest.jl >> >> >>>> pushq %rbp >> >> >>>> movq %rsp, %rbp >> >> >>>> Source line: 2 >> >> >>>> addsd %xmm0, %xmm0 >> >> >>>> movabsq $139916162906648, %rax # imm = 0x7F40C5303A18 >> >> >>>> addsd (%rax), %xmm0 >> >> >>>> popq %rbp >> >> >>>> retq >> >> >>>> nopl (%rax,%rax) >> >> >>>> .text >> >> >>>> Filename: fmatest.jl >> >> >>>> pushq %rbp >> >> >>>> movq %rsp, %rbp >> >> >>>> movabsq $139916162906776, %rax # imm = 0x7F40C5303A98 >> >> >>>> Source line: 3 >> >> >>>> movsd (%rax), %xmm1 # xmm1 = mem[0],zero >> >> >>>> movabsq $139916162906784, %rax # imm = 0x7F40C5303AA0 >> >> >>>> movsd (%rax), %xmm2 # xmm2 = mem[0],zero >> >> >>>> movabsq $139925776008800, %rax # imm = 0x7F43022C8660 >> >> >>>> popq %rbp >> >> >>>> jmpq *%rax >> >> >>>> nopl (%rax) >> >> >>>> >> >> >>>> It looks like explicit muladd or not ends up at the same native >> code, >> >> >>>> but is that native code actually doing an fma? The fma native is >> >> >>>> different, >> >> >>>> but from a discussion on the Gitter it seems that might be a >> software >> >> >>>> FMA? >> >> >>>> This computer is setup with the BIOS setting as LAPACK optimized >> or >> >> >>>> something like that, so is that messing with something? >> >> >>>> >> >> >>>> Computer 2 >> >> >>>> >> >> >>>> Julia Version 0.6.0-dev.557 >> >> >>>> Commit c7a4897 (2016-09-08 17:50 UTC) >> >> >>>> Platform Info: >> >> >>>> System: NT (x86_64-w64-mingw32) >> >> >>>> CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz >> >> >>>> WORD_SIZE: 64 >> >> >>>> BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY >> Haswell) >> >> >>>> LAPACK: libopenblas64_ >> >> >>>> LIBM: libopenlibm >> >> >>>> LLVM: libLLVM-3.7.1 (ORCJIT, haswell) >> >> >>>> >> >> >>>> >> >> >>>> on a 4770k i7, Windows 10, I get the output >> >> >>>> >> >> >>>> ; Function Attrs: uwtable >> >> >>>> define double @julia_f_66153(double) #0 { >> >> >>>> top: >> >> >>>> %1 = fmul double %0, 2.000000e+00 >> >> >>>> %2 = fadd double %1, 3.000000e+00 >> >> >>>> ret double %2 >> >> >>>> } >> >> >>>> >> >> >>>> ; Function Attrs: uwtable >> >> >>>> define double @julia_g_66157(double) #0 { >> >> >>>> top: >> >> >>>> %1 = call double @llvm.fmuladd.f64(double %0, double >> 2.000000e+00, >> >> >>>> double 3.000000e+00) >> >> >>>> ret double %1 >> >> >>>> } >> >> >>>> >> >> >>>> ; Function Attrs: uwtable >> >> >>>> define double @julia_h_66158(double) #0 { >> >> >>>> top: >> >> >>>> %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, >> >> >>>> double >> >> >>>> 3.000000e+00) >> >> >>>> ret double %1 >> >> >>>> } >> >> >>>> .text >> >> >>>> Filename: console >> >> >>>> pushq %rbp >> >> >>>> movq %rsp, %rbp >> >> >>>> Source line: 1 >> >> >>>> addsd %xmm0, %xmm0 >> >> >>>> movabsq $534749456, %rax # imm = 0x1FDFA110 >> >> >>>> addsd (%rax), %xmm0 >> >> >>>> popq %rbp >> >> >>>> retq >> >> >>>> nopl (%rax,%rax) >> >> >>>> .text >> >> >>>> Filename: console >> >> >>>> pushq %rbp >> >> >>>> movq %rsp, %rbp >> >> >>>> Source line: 2 >> >> >>>> addsd %xmm0, %xmm0 >> >> >>>> movabsq $534749584, %rax # imm = 0x1FDFA190 >> >> >>>> addsd (%rax), %xmm0 >> >> >>>> popq %rbp >> >> >>>> retq >> >> >>>> nopl (%rax,%rax) >> >> >>>> .text >> >> >>>> Filename: console >> >> >>>> pushq %rbp >> >> >>>> movq %rsp, %rbp >> >> >>>> movabsq $534749712, %rax # imm = 0x1FDFA210 >> >> >>>> Source line: 3 >> >> >>>> movsd dcabs164_(%rax), %xmm1 # xmm1 = mem[0],zero >> >> >>>> movabsq $534749720, %rax # imm = 0x1FDFA218 >> >> >>>> movsd (%rax), %xmm2 # xmm2 = mem[0],zero >> >> >>>> movabsq $fma, %rax >> >> >>>> popq %rbp >> >> >>>> jmpq *%rax >> >> >>>> nop >> >> >>>> >> >> >>>> This seems to be similar to the first result. >> >> >>>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Erik Schnetter <schn...@gmail.com> >> >> >>> http://www.perimeterinstitute.ca/personal/eschnetter/ >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > Erik Schnetter <schn...@gmail.com> >> >> > http://www.perimeterinstitute.ca/personal/eschnetter/ >> > >> > >> > >> > >> > -- >> > Erik Schnetter <schn...@gmail.com> >> > http://www.perimeterinstitute.ca/personal/eschnetter/ >> > -- Erik Schnetter <schnet...@gmail.com> http://www.perimeterinstitute.ca/personal/eschnetter/