Hi, Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node.
Now all the 64 cores on the cluster running a process. Program is SPMD, means all processes has the same workload. Now if we had done auto-vectorization while compiling the code (for example with Intel compilers); Will there be any benefit (efficiency/scalability improvement) of having code with the auto-vectorization? Or we will get the same performance as without Auto-vectorization in this example case? How can we really get benefit in performance improvement with Auto-Vectorization? Thank you.
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
