Hi Reid, Just saw your bug report and realized that you have an ancient kernel which could be causing the issue. Let's move the discussion to the bug page ( http://redmine.gromacs.org/issues/1184), hopefully we can narrow the issue down and then post the conclusions to the list later.
Cheers, -- Szilárd On Thu, Mar 7, 2013 at 7:06 AM, Roland Schulz <[email protected]> wrote: > Hi Raid, > > I just tested Gromacs 4.6.1 compiled with ICC 13 and GCC 4.1.2 on CentOS > 5.6 and I don't have any problems with pinning. So it might be useful to > open a bug and provide more details, because it should work for CentOS 5.x. > > Yes, for pure water the group kernels are faster than Verlet. > > Roland > > > On Wed, Mar 6, 2013 at 10:17 PM, Reid Van Lehn <[email protected]> wrote: > > > Hi Szilárd, > > > > Thank you very much for the detailed write up. To answer your question, > > yes, I am using an old Linux distro, specifically CentOS 5.4, though > > upgrading to 5.9 still had the same problem. I have another few machines > > with different hardware CentOS 6.3 which does not have this issue so it > is > > likely an operating system issue based on your description. As I'm > > (unfortunately...) also the sysadmin on this cluster I'm unlikely to find > > the time to upgrade all the nodes, so I'll probably stick with the "-pin > > off" workaround for now. Hopefully this thread might help out other > users! > > > > As an aside, I found that the OpenMP + Verlet combination was slower for > > this particular system, but I suspect that it's because it's almost > > entirely water and hence probably benefits from the Group scheme > > optimizations for water described on the Gromacs website. > > > > Thanks again for the explanation, > > Reid > > > > On Mon, Mar 4, 2013 at 3:45 PM, Szilárd Páll <[email protected]> > > wrote: > > > > > Hi, > > > > > > There are some clarifications needed and as this might help you and > other > > > understand what's going on, I'll take the time to explain things. > > > > > > Affinity setting is a low-, operating system-level, operation and > "locks" > > > (="pins") threads to physical cores of the CPU preventing the OS from > > > moving them which can cause performance drop - especially when using > > > OpenMP-multithreading on multi-socket and NUMA machines. > > > > > > Now, mdrun will by default *try* to set affinity if you use all cores > > > detected (i.e if mdrun can be sure that it is the only application > > running > > > on the machine), but will by default *not* set thread affinities if the > > > number of thread/processes per compute node is less than the number of > > > cores detected. Hence, when you decrease -ntmpi to 7, you implicitly > end > > up > > > turning off thread pinning, that's why the warnings don't show up. > > > > > > The fact that affinity setting fails on your machine suggests that > either > > > the system libraries don't support this or the mdrun code is not fully > > > compatible with your OS, the type of CPUs AFAIK don't matter at all. > What > > > OS are you using? Is it an old installation? > > > > > > If you are not using OpenMP - which btw you probably should with the > > Verlet > > > scheme if you are running running on a single node or at high > > > parallelization -, the performance will not be affected very much by > the > > > lack of thread pinning. While the warnings themselves can often be > safely > > > ignored, if only some of the threads/processes can't set affinities, > this > > > might indicate a problem. I your case, if you were really seeing only 5 > > > cores being used with 3 warnings, this might suggest that while the > > > affinity setting failed, three threads are using already "busy" cores > > > overlapping with others which will cause severe performance drop. > > > > > > What you can do to avoid the performance drop is to turn of pinning by > > > passing "-pin off" to mdrun. Without OpenMP this will typically not > > cause a > > > large performance drop compared to having correct pinning and it will > > avoid > > > the bad overlapping threads/processes case. > > > > > > I suspect that your machines might be running an old OS which could be > > > causing the failed affinity setting. If that is the case, you should > talk > > > to your sysadmins and have them figure out the issue. If you have a > > > moderately new OS, you should not be seeing such issues, so I suggest > > that > > > you file a bug report with details like: OS + version + kernel version, > > > pthread library version, standard C library version. > > > > > > Cheers, > > > > > > -- > > > Szilárd > > > > > > > > > On Mon, Mar 4, 2013 at 1:45 PM, Mark Abraham <[email protected] > > > >wrote: > > > > > > > On Mon, Mar 4, 2013 at 6:02 AM, Reid Van Lehn <[email protected]> > > wrote: > > > > > > > > > Hello users, > > > > > > > > > > I ran into a bug I do not understand today upon upgrading from v. > > 4.5.5 > > > > to > > > > > v 4.6. I'm using older 8 core Intel Xeon E5430 machines, and when I > > > > > submitted a job for 8 cores to one of the nodes I received the > > > following > > > > > error: > > > > > > > > > > NOTE: In thread-MPI thread #3: Affinity setting failed. > > > > > This can cause performance degradation! > > > > > > > > > > NOTE: In thread-MPI thread #2: Affinity setting failed. > > > > > This can cause performance degradation! > > > > > > > > > > NOTE: In thread-MPI thread #1: Affinity setting failed. > > > > > This can cause performance degradation! > > > > > > > > > > I ran mdrun simply with the flags: > > > > > > > > > > mdrun -v -ntmpi 8 -deffnm em > > > > > > > > > > Using the top command, I confirmed that no other programs were > > running > > > > and > > > > > that mdrun was in fact only using 5 cores. Reducing -ntmpi to 7, > > > however, > > > > > resulted in no error (only a warning about not using all of the > > logical > > > > > cores) and mdrun used 7 cores correctly. Since it warned about > thread > > > > > affinity settings, I tried setting -pin on -pinoffset 0 even > though I > > > was > > > > > using all the cores on the machine. This resulted in the same > error. > > > > > However, turning pinning off explicitly with -pin off (rather than > > -pin > > > > > auto) did correctly give me the all 8 cores again. > > > > > > > > > > While I figured out a solution in this particular instance, my > > question > > > > is > > > > > whether I should be have known from my hardware/settings that > pinning > > > > > should be turned off (for future reference), or if this is a bug? > > > > > > > > > > > > > I'm not sure - those are 2007-era processors, so there may be some > > > > limitations in what they could do (or how well the kernel and system > > > > libraries support it). So investing time into working out the real > > > problem > > > > is not really worthwhile. Thanks for reporting your work-around, > > however, > > > > others might benefit from it. If you plan on doing lengthy > simulations, > > > you > > > > might like to verify that you get linear scaling with increasing > > -ntmpi, > > > > and/or compare performance with the MPI version on the same hardware. > > > > > > > > Mark > > > > -- > > > > gmx-users mailing list [email protected] > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > > > * Please search the archive at > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > > > * Please don't post (un)subscribe requests to the list. Use the > > > > www interface or send it to [email protected]. > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > > -- > > > gmx-users mailing list [email protected] > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > > * Please don't post (un)subscribe requests to the list. Use the > > > www interface or send it to [email protected]. > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > > > > > > -- > > Reid Van Lehn > > NSF/MIT Presidential Fellow > > Alfredo Alexander-Katz Research Group > > Ph.D Candidate - Materials Science > > -- > > gmx-users mailing list [email protected] > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > * Please don't post (un)subscribe requests to the list. Use the > > www interface or send it to [email protected]. > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > > > > > > > > -- > ORNL/UT Center for Molecular Biophysics cmb.ornl.gov > 865-241-1537, ORNL PO BOX 2008 MS6309 > -- > gmx-users mailing list [email protected] > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to [email protected]. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

