Some more thoughts in this thread that I've not seen expressed yet
(perhaps I missed them):
+ Some argue that this change in the middle of a stable series may, to
some users, appear to be a performance regression when they update.
However, I would argue that if the alternative is to delay this feature
until the next stable release, it will STILL appear to those same users
to be a performance regression when they upgrade. If the choice is
between sooner or later I would vote for sooner.
+ I wonder if one can do any "introspection" with the dynamic linker to
detect hybrid OpenMP (no "I") apps and avoid pinning them by default
(examining OMP_NUM_THREADS in the environment is no good, since that
variable may have a site default value other than 1 or empty). To me
this is the most obvious class of application that will suffer from
imposing pinning by default.
+ The question of round-robin-by-core vs round-robin-by-socket is not
fundamentally any different from the question of how to map one's tasks
to flat-SMP nodes (cylic, block or block-cylic; XYZT vs TXYZ, etc.)
There is NO universal right answer, and for better or worse the end-user
that wants to maximize performance is going to need to either understand
how their comms interact with task layout, or they are going to try
different options until the are happy.
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group Tel: +1-510-495-2352
HPC Research Department Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory