Some more thoughts in this thread that I've not seen expressed yet (perhaps I missed them):

+ Some argue that this change in the middle of a stable series may, to some users, appear to be a performance regression when they update. However, I would argue that if the alternative is to delay this feature until the next stable release, it will STILL appear to those same users to be a performance regression when they upgrade. If the choice is between sooner or later I would vote for sooner.

+ I wonder if one can do any "introspection" with the dynamic linker to detect hybrid OpenMP (no "I") apps and avoid pinning them by default (examining OMP_NUM_THREADS in the environment is no good, since that variable may have a site default value other than 1 or empty). To me this is the most obvious class of application that will suffer from imposing pinning by default.

+ The question of round-robin-by-core vs round-robin-by-socket is not fundamentally any different from the question of how to map one's tasks to flat-SMP nodes (cylic, block or block-cylic; XYZT vs TXYZ, etc.) There is NO universal right answer, and for better or worse the end-user that wants to maximize performance is going to need to either understand how their comms interact with task layout, or they are going to try different options until the are happy.

-Paul

--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory

Reply via email to