Hello, Kenneth, sorry I could not make it.
about vectorization it's interesting to know that Intel has backtracked a little bit on AVX512 vectorization for SkyLake in their newest compilers 2017.5 (not <=2017.4) and 2018.*: https://software.intel.com/en-us/articles/tuning-simd-vectorization-when-targeting-intel-xeon-processor-scalable-family There are a lot of applications out there with variable length loops with low trip counts, for which using the 512-bit zmm registers introduces a lot of overhead that in fact slows down the applications. We did benchmarks within Compute Canada with Intel 2017.1 and saw that sometimes -xCORE-AVX512 in fact produced slower code than -xCORE-AVX2. With the new default -qopt-zmm-usage=low setting that is no longer the case, although certain applications can benefit from "high" zmm usage (unfortunately I don't know which ones exactly). It's certain that HPL benefits greatly from AVX512 (mostly via MKL or other linalg libs), and others such as Gromacs/FFTW with custom code. Bart -- Dr. Bart E. Oldeman | bart.olde...@mcgill.ca | bart.olde...@calculquebec.ca Scientific Computing Analyst / Analyste en calcul scientifique McGill HPC Centre / Centre de Calcul Haute Performance de McGill | http://www.hpc.mcgill.ca Calcul Québec | http://www.calculquebec.ca Compute/Calcul Canada | http://www.computecanada.ca Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934