Hi,
  First of all, does LLVM essentially fma or muladd expressions like `a1*x1 
+ a2*x2 + a3*x3 + a4*x4`? Or is it required that one explicitly use 
`muladd` and `fma` on these types of instructions (is there a macro for 
making this easier)?

  Secondly, I am wondering if my setup is no applying these operations 
correctly. Here's my test code:

f(x) = 2.0x + 3.0
g(x) = muladd(x,2.0, 3.0)
h(x) = fma(x,2.0, 3.0)

@code_llvm f(4.0)
@code_llvm g(4.0)
@code_llvm h(4.0)

@code_native f(4.0)
@code_native g(4.0)
@code_native h(4.0)

*Computer 1*

Julia Version 0.5.0-rc4+0
Commit 9c76c3e* (2016-09-09 01:43 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblasp.so.0
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)

(the COPR nightly on CentOS7) with 

[crackauc@crackauc2 ~]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Stepping:              1
CPU MHz:               1200.000
BogoMIPS:              6392.58
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15



I get the output

define double @julia_f_72025(double) #0 {
top:
  %1 = fmul double %0, 2.000000e+00
  %2 = fadd double %1, 3.000000e+00
  ret double %2
}

define double @julia_g_72027(double) #0 {
top:
  %1 = call double @llvm.fmuladd.f64(double %0, double 2.000000e+00, double 
3.000000e+00)
  ret double %1
}

define double @julia_h_72029(double) #0 {
top:
  %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, double 
3.000000e+00)
  ret double %1
}
.text
Filename: fmatest.jl
pushq %rbp
movq %rsp, %rbp
Source line: 1
addsd %xmm0, %xmm0
movabsq $139916162906520, %rax  # imm = 0x7F40C5303998
addsd (%rax), %xmm0
popq %rbp
retq
nopl (%rax,%rax)
.text
Filename: fmatest.jl
pushq %rbp
movq %rsp, %rbp
Source line: 2
addsd %xmm0, %xmm0
movabsq $139916162906648, %rax  # imm = 0x7F40C5303A18
addsd (%rax), %xmm0
popq %rbp
retq
nopl (%rax,%rax)
.text
Filename: fmatest.jl
pushq %rbp
movq %rsp, %rbp
movabsq $139916162906776, %rax  # imm = 0x7F40C5303A98
Source line: 3
movsd (%rax), %xmm1           # xmm1 = mem[0],zero
movabsq $139916162906784, %rax  # imm = 0x7F40C5303AA0
movsd (%rax), %xmm2           # xmm2 = mem[0],zero
movabsq $139925776008800, %rax  # imm = 0x7F43022C8660
popq %rbp
jmpq *%rax
nopl (%rax)

It looks like explicit muladd or not ends up at the same native code, but 
is that native code actually doing an fma? The fma native is different, but 
from a discussion on the Gitter it seems that might be a software FMA? This 
computer is setup with the BIOS setting as LAPACK optimized or something 
like that, so is that messing with something?

*Computer 2*

Julia Version 0.6.0-dev.557
Commit c7a4897 (2016-09-08 17:50 UTC)
Platform Info:
  System: NT (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)


on a 4770k i7, Windows 10, I get the output

; Function Attrs: uwtable
define double @julia_f_66153(double) #0 {
top:
  %1 = fmul double %0, 2.000000e+00
  %2 = fadd double %1, 3.000000e+00
  ret double %2
}

; Function Attrs: uwtable
define double @julia_g_66157(double) #0 {
top:
  %1 = call double @llvm.fmuladd.f64(double %0, double 2.000000e+00, double 
3.000000e+00)
  ret double %1
}

; Function Attrs: uwtable
define double @julia_h_66158(double) #0 {
top:
  %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, double 
3.000000e+00)
  ret double %1
}
.text
Filename: console
pushq %rbp
movq %rsp, %rbp
Source line: 1
addsd %xmm0, %xmm0
movabsq $534749456, %rax        # imm = 0x1FDFA110
addsd (%rax), %xmm0
popq %rbp
retq
nopl (%rax,%rax)
.text
Filename: console
pushq %rbp
movq %rsp, %rbp
Source line: 2
addsd %xmm0, %xmm0
movabsq $534749584, %rax        # imm = 0x1FDFA190
addsd (%rax), %xmm0
popq %rbp
retq
nopl (%rax,%rax)
.text
Filename: console
pushq %rbp
movq %rsp, %rbp
movabsq $534749712, %rax        # imm = 0x1FDFA210
Source line: 3
movsd dcabs164_(%rax), %xmm1  # xmm1 = mem[0],zero
movabsq $534749720, %rax        # imm = 0x1FDFA218
movsd (%rax), %xmm2           # xmm2 = mem[0],zero
movabsq $fma, %rax
popq %rbp
jmpq *%rax
nop

This seems to be similar to the first result.

Reply via email to