Hi, There are some clarifications needed and as this might help you and other understand what's going on, I'll take the time to explain things.
Affinity setting is a low-, operating system-level, operation and "locks" (="pins") threads to physical cores of the CPU preventing the OS from moving them which can cause performance drop - especially when using OpenMP-multithreading on multi-socket and NUMA machines. Now, mdrun will by default *try* to set affinity if you use all cores detected (i.e if mdrun can be sure that it is the only application running on the machine), but will by default *not* set thread affinities if the number of thread/processes per compute node is less than the number of cores detected. Hence, when you decrease -ntmpi to 7, you implicitly end up turning off thread pinning, that's why the warnings don't show up. The fact that affinity setting fails on your machine suggests that either the system libraries don't support this or the mdrun code is not fully compatible with your OS, the type of CPUs AFAIK don't matter at all. What OS are you using? Is it an old installation? If you are not using OpenMP - which btw you probably should with the Verlet scheme if you are running running on a single node or at high parallelization -, the performance will not be affected very much by the lack of thread pinning. While the warnings themselves can often be safely ignored, if only some of the threads/processes can't set affinities, this might indicate a problem. I your case, if you were really seeing only 5 cores being used with 3 warnings, this might suggest that while the affinity setting failed, three threads are using already "busy" cores overlapping with others which will cause severe performance drop. What you can do to avoid the performance drop is to turn of pinning by passing "-pin off" to mdrun. Without OpenMP this will typically not cause a large performance drop compared to having correct pinning and it will avoid the bad overlapping threads/processes case. I suspect that your machines might be running an old OS which could be causing the failed affinity setting. If that is the case, you should talk to your sysadmins and have them figure out the issue. If you have a moderately new OS, you should not be seeing such issues, so I suggest that you file a bug report with details like: OS + version + kernel version, pthread library version, standard C library version. Cheers, -- Szilárd On Mon, Mar 4, 2013 at 1:45 PM, Mark Abraham <[email protected]>wrote: > On Mon, Mar 4, 2013 at 6:02 AM, Reid Van Lehn <[email protected]> wrote: > > > Hello users, > > > > I ran into a bug I do not understand today upon upgrading from v. 4.5.5 > to > > v 4.6. I'm using older 8 core Intel Xeon E5430 machines, and when I > > submitted a job for 8 cores to one of the nodes I received the following > > error: > > > > NOTE: In thread-MPI thread #3: Affinity setting failed. > > This can cause performance degradation! > > > > NOTE: In thread-MPI thread #2: Affinity setting failed. > > This can cause performance degradation! > > > > NOTE: In thread-MPI thread #1: Affinity setting failed. > > This can cause performance degradation! > > > > I ran mdrun simply with the flags: > > > > mdrun -v -ntmpi 8 -deffnm em > > > > Using the top command, I confirmed that no other programs were running > and > > that mdrun was in fact only using 5 cores. Reducing -ntmpi to 7, however, > > resulted in no error (only a warning about not using all of the logical > > cores) and mdrun used 7 cores correctly. Since it warned about thread > > affinity settings, I tried setting -pin on -pinoffset 0 even though I was > > using all the cores on the machine. This resulted in the same error. > > However, turning pinning off explicitly with -pin off (rather than -pin > > auto) did correctly give me the all 8 cores again. > > > > While I figured out a solution in this particular instance, my question > is > > whether I should be have known from my hardware/settings that pinning > > should be turned off (for future reference), or if this is a bug? > > > > I'm not sure - those are 2007-era processors, so there may be some > limitations in what they could do (or how well the kernel and system > libraries support it). So investing time into working out the real problem > is not really worthwhile. Thanks for reporting your work-around, however, > others might benefit from it. If you plan on doing lengthy simulations, you > might like to verify that you get linear scaling with increasing -ntmpi, > and/or compare performance with the MPI version on the same hardware. > > Mark > -- > gmx-users mailing list [email protected] > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to [email protected]. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

