On Thu, Feb 23, 2012 at 12:13:18 -0800, Iain Morgan wrote:
> On Wed, Feb 15, 2012 at 18:19:10 -0600, Rich Megginson wrote:
> > On 02/15/2012 03:51 PM, Iain Morgan wrote:
> > > On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
> > >> On 02/15/2012 01:56 PM, Iain Morgan wrote:
> > >>> On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
> > >>>> On 02/14/2012 06:37 PM, Iain Morgan wrote:
> > >>>>> Hello,
> > >>>>>
> > >>>>> On a fairly frequent basis, one of my 389 DS servers hangs after 
> > >>>>> certain
> > >>>>> CMP operations. Once this happens, the server cannot be shutdown
> > >>>>> gracefully. This has been going on for several weeks, and I have not 
> > >>>>> yet
> > >>>>> found a solution.
> > >>>>>
> > >>>>> My setup consists of two systems running RHEL 6.2 with 389 DS 
> > >>>>> 1.2.9.16.
> > >>>>> Multimaster replication is enabled between the two servers, but the
> > >>>>> client systems (currently just two test systems) preferrentially use 
> > >>>>> the
> > >>>>> same server, ServerA. The second server, ServerB, is the one which is
> > >>>>> experiencing the problem.
> > >>>>>
> > >>>>> We are using class-of-service entries to to set the values for the
> > >>>>> shadowMax, shadowMin, and shadowWarning attributes. And we are
> > >>>>> conditionally setting a pwdPolicySubentry attribute for some entries 
> > >>>>> in
> > >>>>> the same manner.
> > >>>>>
> > >>>>> If I execute an ldapcompare command, such as the following:
> > >>>>>
> > >>>>> # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \
> > >>>>>       pwdpolicysubentry:"cn=Special 
> > >>>>> Policy,ou=Policies,dc=example,dc=com"
> > >>>>>
> > >>>>> the command will occassionally hang. Most of the time, the command
> > >>>>> succeeds and indicates that the attribute is not defined for that 
> > >>>>> entry.
> > >>>>> However, once or twice a day it will simply hang.
> > >>>>>
> > >>>>> The access log shows that the CMP request was received, but no result 
> > >>>>> is
> > >>>>> logged. After this occurs, the server will not shut down gracefully. 
> > >>>>> The
> > >>>>> init script fails to shut down the server and I end up having to send 
> > >>>>> a
> > >>>>> SIGKILL to ns-slapd.
> > >>>> When you get the hang, can you attach to the process with gdb?
> > >>>> ps -ef|grep ns-slapd
> > >>>> gdb /usr/sbin/ns-slapd pid-of-ns-slapd
> > >>>>> The error log does not report any issues.
> > >>>>>
> > >>>>> CMP operations against other attributes, such as loginShell, do not 
> > >>>>> seem
> > >>>>> to exhibit this problem. Also, the problem does not occur on ServerA;
> > >>>>> only on ServerB. Once the CMP operation has hung, comparisons against
> > >>>>> other attributes, even shadowMax, continue to work.
> > >>>>>
> > >>>>> As noted above, most of the time the CMP operation returns normally.
> > >>>>> However, if I reinitialize ServerB from ServerA, the problem occurs 
> > >>>>> with
> > >>>>> the first CMP operation against ServerB.
> > >>>>>
> > >>>>> Both servers have the same set of RPMs and the dse.ldif on both 
> > >>>>> systems
> > >>>>> do not have any significant differences.
> > >>>>>
> > >>>>> Has anyone seen a similar issue? Any suggestions on how to debug of 
> > >>>>> fix
> > >>>>> this?
> > >>>>>
> > >>>>> A somewhat simplified and redacted version of the class-of-service
> > >>>>> configuration is listed below.
> > >>>>>
> > >>>>> Thanks
> > >>> A gzip'd copy of the 'thread apply all bt full' output is attached.
> > >>>
> > >> Thanks.  Can you do this again after installing the
> > >> 389-ds-base-debuginfo package?
> > >> debuginfo-install 389-ds-base
> > > Ah, sorry about that. Here's the updated output.
> > >
> > >> Are you using Views?
> > >> http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/using-views.html
> > > No.
> > >
> > Thanks!  This looks like a symptom of 
> > https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
> 
> Hello Rich,
> 
> Thanks, I upgraded both of the servers to 1.2.10.1. Unfortunately, it
> did not resolve the issue. I also noticed that if I run the same
> ldapcompare command after the first try fails, the server crashes. I
> can't say whether that is a change in the behaviour, but it is a new
> observation.
> 
> I've attached gdb output for the case where the first ldapcompare is
> hanging. And, I've also attached the gdb analysis of the core dump.
> 
> -- 
> Iain Morgan

I've tested 1.2.10.3 and can confirm that it addresses the segfault.
However, the hang (presumably a deadlock) has not gone away. I don't
seem to be able to update bug #305 now that it is closed, so I am
attaching the gdb backtrace of ns-slapd 1.2.10.3 during the server hang.

-- 
Iain Morgan

Attachment: bt-during-hang.txt.gz
Description: application/gunzip

--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to