Hi Reid,

Just saw your bug report and realized that you have an ancient kernel which
could be causing the issue. Let's move the discussion to the bug page (
http://redmine.gromacs.org/issues/1184), hopefully we can narrow the issue
down and then post the conclusions to the list later.

Cheers,

--
Szilárd


On Thu, Mar 7, 2013 at 7:06 AM, Roland Schulz <[email protected]> wrote:

> Hi Raid,
>
> I just tested Gromacs 4.6.1 compiled with ICC 13 and GCC 4.1.2 on CentOS
> 5.6 and I don't have any problems with pinning. So it might be useful to
> open a bug and provide more details, because it should work for CentOS 5.x.
>
> Yes, for pure water the group kernels are faster than Verlet.
>
> Roland
>
>
> On Wed, Mar 6, 2013 at 10:17 PM, Reid Van Lehn <[email protected]> wrote:
>
> > Hi Szilárd,
> >
> > Thank you very much for the detailed write up. To answer your question,
> > yes, I am using an old Linux distro, specifically CentOS 5.4, though
> > upgrading to 5.9 still had the same problem. I have another few machines
> > with different hardware CentOS 6.3 which does not have this issue so it
> is
> > likely an operating system issue based on your description. As I'm
> > (unfortunately...) also the sysadmin on this cluster I'm unlikely to find
> > the time to upgrade all the nodes, so I'll probably stick with the "-pin
> > off" workaround for now. Hopefully this thread might help out other
> users!
> >
> > As an aside, I found that the OpenMP + Verlet combination was slower for
> > this particular system, but I suspect that it's because it's almost
> > entirely water and hence probably benefits from the Group scheme
> > optimizations for water described on the Gromacs website.
> >
> > Thanks again for the explanation,
> > Reid
> >
> > On Mon, Mar 4, 2013 at 3:45 PM, Szilárd Páll <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > There are some clarifications needed and as this might help you and
> other
> > > understand what's going on, I'll take the time to explain things.
> > >
> > > Affinity setting is a low-, operating system-level, operation and
> "locks"
> > > (="pins") threads to physical cores of the CPU preventing the OS from
> > > moving them which can cause performance drop - especially when using
> > > OpenMP-multithreading on multi-socket and NUMA machines.
> > >
> > > Now, mdrun will by default *try* to set affinity if you use all cores
> > > detected (i.e if mdrun can be sure that it is the only application
> > running
> > > on the machine), but will by default *not* set thread affinities if the
> > > number of thread/processes per compute node is less than the number of
> > > cores detected. Hence, when you decrease -ntmpi to 7, you implicitly
> end
> > up
> > > turning off thread pinning, that's why the warnings don't show up.
> > >
> > > The fact that affinity setting fails on your machine suggests that
> either
> > > the system libraries don't support this or the mdrun code is not fully
> > > compatible with your OS, the type of CPUs AFAIK don't matter at all.
> What
> > > OS are you using? Is it an old installation?
> > >
> > > If you are not using OpenMP - which btw you probably should with the
> > Verlet
> > > scheme if you are running running on a single node or at high
> > > parallelization -, the performance will not be affected very much by
> the
> > > lack of thread pinning. While the warnings themselves can often be
> safely
> > > ignored, if only some of the threads/processes can't set affinities,
> this
> > > might indicate a problem. I your case, if you were really seeing only 5
> > > cores being used with 3 warnings, this might suggest that while the
> > > affinity setting failed, three threads are using already "busy" cores
> > > overlapping with others which will cause severe performance drop.
> > >
> > > What you can do to avoid the performance drop is to turn of pinning by
> > > passing "-pin off" to mdrun. Without OpenMP this will typically not
> > cause a
> > > large performance drop compared to having correct pinning and it will
> > avoid
> > > the bad overlapping threads/processes case.
> > >
> > > I suspect that your machines might be running an old OS which could be
> > > causing the failed affinity setting. If that is the case, you should
> talk
> > > to your sysadmins and have them figure out the issue. If you have a
> > > moderately new OS, you should not be seeing such issues, so I suggest
> > that
> > > you file a bug report with details like: OS + version + kernel version,
> > > pthread library version, standard C library version.
> > >
> > > Cheers,
> > >
> > > --
> > > Szilárd
> > >
> > >
> > > On Mon, Mar 4, 2013 at 1:45 PM, Mark Abraham <[email protected]
> > > >wrote:
> > >
> > > > On Mon, Mar 4, 2013 at 6:02 AM, Reid Van Lehn <[email protected]>
> > wrote:
> > > >
> > > > > Hello users,
> > > > >
> > > > > I ran into a bug I do not understand today upon upgrading from v.
> > 4.5.5
> > > > to
> > > > > v 4.6. I'm using older 8 core Intel Xeon E5430 machines, and when I
> > > > > submitted a job for 8 cores to one of the nodes I received the
> > > following
> > > > > error:
> > > > >
> > > > > NOTE: In thread-MPI thread #3: Affinity setting failed.
> > > > >       This can cause performance degradation!
> > > > >
> > > > > NOTE: In thread-MPI thread #2: Affinity setting failed.
> > > > >       This can cause performance degradation!
> > > > >
> > > > > NOTE: In thread-MPI thread #1: Affinity setting failed.
> > > > >       This can cause performance degradation!
> > > > >
> > > > > I ran mdrun simply with the flags:
> > > > >
> > > > > mdrun -v -ntmpi 8 -deffnm em
> > > > >
> > > > > Using the top command, I confirmed that no other programs were
> > running
> > > > and
> > > > > that mdrun was in fact only using 5 cores. Reducing -ntmpi to 7,
> > > however,
> > > > > resulted in no error (only a warning about not using all of the
> > logical
> > > > > cores) and mdrun used 7 cores correctly. Since it warned about
> thread
> > > > > affinity settings, I tried setting -pin on -pinoffset 0 even
> though I
> > > was
> > > > > using all the cores on the machine. This resulted in the same
> error.
> > > > > However, turning pinning off explicitly with -pin off (rather than
> > -pin
> > > > > auto) did correctly give me the all 8 cores again.
> > > > >
> > > > > While I figured out a solution in this particular instance, my
> > question
> > > > is
> > > > > whether I should be have known from my hardware/settings that
> pinning
> > > > > should be turned off (for future reference), or if this is a bug?
> > > > >
> > > >
> > > > I'm not sure - those are 2007-era processors, so there may be some
> > > > limitations in what they could do (or how well the kernel and system
> > > > libraries support it). So investing time into working out the real
> > > problem
> > > > is not really worthwhile. Thanks for reporting your work-around,
> > however,
> > > > others might benefit from it. If you plan on doing lengthy
> simulations,
> > > you
> > > > might like to verify that you get linear scaling with increasing
> > -ntmpi,
> > > > and/or compare performance with the MPI version on the same hardware.
> > > >
> > > > Mark
> > > > --
> > > > gmx-users mailing list    [email protected]
> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > www interface or send it to [email protected].
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > --
> > > gmx-users mailing list    [email protected]
> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > * Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to [email protected].
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> >
> >
> >
> > --
> > Reid Van Lehn
> > NSF/MIT Presidential Fellow
> > Alfredo Alexander-Katz Research Group
> > Ph.D Candidate - Materials Science
> > --
> > gmx-users mailing list    [email protected]
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to [email protected].
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> >
> >
> >
> >
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
> --
> gmx-users mailing list    [email protected]
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to [email protected].
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
--
gmx-users mailing list    [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [email protected].
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to