Hello,

Kenneth, sorry I could not make it.

about vectorization it's interesting to know that Intel has
backtracked a little bit on AVX512 vectorization for SkyLake in their
newest compilers 2017.5 (not <=2017.4) and 2018.*:
https://software.intel.com/en-us/articles/tuning-simd-vectorization-when-targeting-intel-xeon-processor-scalable-family

There are a lot of applications out there with variable length loops
with low trip counts, for which using the 512-bit zmm registers
introduces a lot of overhead that in fact slows down the applications.
We did benchmarks within Compute Canada with Intel 2017.1 and saw that
sometimes -xCORE-AVX512 in fact produced slower code than -xCORE-AVX2.
With the new default -qopt-zmm-usage=low setting that is no longer the
case, although certain applications can benefit from "high" zmm usage
(unfortunately I don't know which ones exactly).

It's certain that HPL benefits greatly from AVX512 (mostly via MKL or
other linalg libs), and others such as Gromacs/FFTW with custom code.

Bart

-- 
Dr. Bart E. Oldeman | bart.olde...@mcgill.ca | bart.olde...@calculquebec.ca
Scientific Computing Analyst / Analyste en calcul scientifique
McGill HPC Centre / Centre de Calcul Haute Performance de McGill |
http://www.hpc.mcgill.ca
Calcul Québec | http://www.calculquebec.ca
Compute/Calcul Canada | http://www.computecanada.ca
Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934

Reply via email to