I see. So what I am getting is that, in my codes, 1. I will need to add @fastmath anywhere I want these optimizations to show up. That should be easy since I can just add it at the beginnings of loops where I have @inbounds which already covers every major inner loop I have. Easy find/replace fix.
2. For my own setup, I am going to need to build from source to get all the optimizations? I would've though the point of using the Linux repositories instead of the generic binaries is that they would be setup to build for your system. That's just a non-expert's misconception I guess? I think that should be highlighted somewhere. On Wednesday, September 21, 2016 at 12:11:34 PM UTC-7, Milan Bouchet-Valat wrote: > > Le mercredi 21 septembre 2016 à 11:36 -0700, Chris Rackauckas a écrit : > > The Windows one is using the pre-built binary. The Linux one uses the > > COPR nightly (I assume that should build with all the goodies?) > The Copr RPMs are subject to the same constraint as official binaries: > we need them to work on most machines. So they don't enable FMA (nor > e.g. AVX) either. > > It would be nice to find a way to ship with several pre-built sysimg > files and using the highest instruction set supported by your CPU. > > > Regards > > > > > Hi, > > > > First of all, does LLVM essentially fma or muladd expressions > > > > like `a1*x1 + a2*x2 + a3*x3 + a4*x4`? Or is it required that one > > > > explicitly use `muladd` and `fma` on these types of instructions > > > > (is there a macro for making this easier)? > > > > > > > > > > You will generally need to use muladd, unless you use @fastmath. > > > > > > > > > > Secondly, I am wondering if my setup is no applying these > > > > operations correctly. Here's my test code: > > > > > > > > > > If you're using the prebuilt downloads (as opposed to building from > > > source), you will need to rebuild the sysimg (look in > > > contrib/build_sysimg.jl) as we build for the lowest-common > > > architecture. > > > > > > -Simon > > > >