On Wed, Oct 19, 2016 at 07:05:14PM +0200, thierry bordaz wrote:
> 
> 
> On 10/19/2016 06:54 PM, Andrew E. Bruno wrote:
> > On Wed, Oct 19, 2016 at 06:33:05PM +0200, thierry bordaz wrote:
> > > 
> > > On 10/19/2016 03:48 PM, Andrew E. Bruno wrote:
> > > > On Wed, Oct 19, 2016 at 10:13:26AM +0200, Ludwig Krispenz wrote:
> > > > > On 10/18/2016 08:52 PM, Andrew E. Bruno wrote:
> > > > > > We had one of our replicas fail today with the following errors:
> > > > > > 
> > > > > > 
> > > > > > [18/Oct/2016:13:40:47 -0400] 
> > > > > > agmt="cn=meTosrv-m14-32.cbls.ccr.buffalo.edu" (srv-m14-32:389) - 
> > > > > > Can't locate CSN 58065ef3000100030000 in the changelog (DB 
> > > > > > rc=-30988). If replication stops, the consumer may need to be 
> > > > > > reinitialized.
> > > > > > [18/Oct/2016:13:43:07 -0400] NSMMReplicationPlugin - changelog 
> > > > > > program - _cl5WriteOperationTxn: retry (49) the transaction 
> > > > > > (csn=58065f74000500040000) failed (rc=-30993 (BDB0068 
> > > > > > DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> > > > > > [18/Oct/2016:13:43:07 -0400] NSMMReplicationPlugin - changelog 
> > > > > > program - _cl5WriteOperationTxn: failed to write entry with csn 
> > > > > > (58065f74000500040000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: 
> > > > > > Locker killed to resolve a deadlock
> > > > > > [18/Oct/2016:13:43:07 -0400] NSMMReplicationPlugin - 
> > > > > > write_changelog_and_ruv: can't add a change for 
> > > > > > uid=janedoe,cn=users,cn=accounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu 
> > > > > > (uniqid: 939bca48-2ced11e5-ac0b8f7e-e0b1a377, optype: 64) to 
> > > > > > changelog csn 58065f74000500040000
> > > > > > [18/Oct/2016:13:43:07 -0400] - SLAPI_PLUGIN_BE_TXN_POST_MODRDN_FN 
> > > > > > plugin returned error but did not set SLAPI_RESULT_CODE
> > > > > > [18/Oct/2016:13:43:07 -0400] NSMMReplicationPlugin - 
> > > > > > process_postop: Failed to apply update (58065f74000500040000) error 
> > > > > > (1).  Aborting replication session(conn=1314106 op=1688559)
> > > > > > [18/Oct/2016:13:43:12 -0400] - cos_cache_change_notify: modified 
> > > > > > entry is NULL--updating cache just in case
> > > > > > [18/Oct/2016:13:43:12 -0400] - Skipping CoS Definition cn=Password 
> > > > > > Policy,cn=accounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu--no CoS 
> > > > > > Templates found, which should be added before the CoS Definition.
> > > > > > [18/Oct/2016:13:43:20 -0400] - Operation error fetching Null DN 
> > > > > > (4a729f9a-955a11e6-aaffa516-e778e883), error -30993.
> > > > > > [18/Oct/2016:13:43:20 -0400] - dn2entry_ext: Failed to get id for 
> > > > > > changenumber=30856302,cn=changelog from entryrdn index (-30993)
> > > > > > [18/Oct/2016:13:43:20 -0400] - Operation error fetching 
> > > > > > changenumber=30856302,cn=changelog (null), error -30993.
> > > > > > [18/Oct/2016:13:43:20 -0400] DSRetroclPlugin - replog: an error 
> > > > > > occured while adding change number 30856302, dn = 
> > > > > > changenumber=30856302,cn=changelog: Operations error.
> > > > > > [18/Oct/2016:13:43:20 -0400] retrocl-plugin - retrocl_postob: 
> > > > > > operation failure [1]
> > > > > > [18/Oct/2016:13:43:20 -0400] NSMMReplicationPlugin - 
> > > > > > process_postop: Failed to apply update (58065f9f000000600000) error 
> > > > > > (1).  Aborting replication session(conn=1901274 op=5)
> > > > > > [18/Oct/2016:13:43:24 -0400] - ldbm_back_seq deadlock retry BAD 
> > > > > > 1601, err=0 BDB0062 Successful return: 0
> > > > > > [18/Oct/2016:13:43:25 -0400] NSMMReplicationPlugin - changelog 
> > > > > > program - _cl5WriteOperationTxn: retry (49) the transaction 
> > > > > > (csn=58065f7c000a00040000) failed (rc=-30993 (BDB0068 
> > > > > > DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> > > > > > [18/Oct/2016:13:43:25 -0400] NSMMReplicationPlugin - changelog 
> > > > > > program - _cl5WriteOperationTxn: failed to write entry with csn 
> > > > > > (58065f7c000a00040000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: 
> > > > > > Locker killed to resolve a deadlock
> > > > > > [18/Oct/2016:13:43:25 -0400] NSMMReplicationPlugin - 
> > > > > > write_changelog_and_ruv: can't add a change for 
> > > > > > uid=janedoe,cn=users,cn=accounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu 
> > > > > > (uniqid: 4080421a-2d0211e5-ac0b8f7e-e0b1a377, optype: 64) to 
> > > > > > changelog csn 58065f7c000a00040000
> > > > > > 
> > > > > > 
> > > > > > ns-slapd was hung so we restarted and now it's stuck and won't come 
> > > > > > back up. It
> > > > > > hangs up here:
> > > > > > 
> > > > > > [18/Oct/2016:14:12:31 -0400] - Skipping CoS Definition cn=Password 
> > > > > > Policy,cn=accounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu--no CoS 
> > > > > > Templates found, which should be added before the CoS Definition.
> > > > > > [18/Oct/2016:14:12:31 -0400] NSMMReplicationPlugin - changelog 
> > > > > > program - _cl5NewDBFile: PR_DeleteSemaphore: 
> > > > > > /var/lib/dirsrv/slapd-CBLS-CCR-BUFFALO-EDU/cldb/a32992ce-71b811e5-9d33a516-e778e883.sema;
> > > > > >  NSPR error - -5943
> > > > > > [18/Oct/2016:14:12:32 -0400] NSMMReplicationPlugin - changelog 
> > > > > > program - _cl5NewDBFile: PR_DeleteSemaphore: 
> > > > > > /var/lib/dirsrv/slapd-CBLS-CCR-BUFFALO-EDU/cldb/986efe12-71b811e5-9d33a516-e778e883.sema;
> > > > > >  NSPR error - -5943
> > > > > > 
> > > > > > 
> > > > > > Tried deleting the semaphore files and restarting but no luck. 
> > > > > > Attached
> > > > > > is a stacktrace of the stuck ns-slapd process.
> > > > > > 
> > > > > > Here's the versions were running:
> > > > > > 
> > > > > > ipa-server-4.2.0-15.0.1.el7.centos.19.x86_64
> > > > > > 389-ds-base-1.3.4.0-33.el7_2.x86_64
> > > > > > 
> > > > > > FWIW, we were experimenting with the new life-cycle management 
> > > > > > features,
> > > > > > specifically "preserved" users and deleted the user "janedoe" when 
> > > > > > this
> > > > > > happened.  From the errors above looks like this host failed to
> > > > > > replicate the change?  Not sure if this is related or not.
> > > > > > 
> > > > > > Is it possible to recover the database? Thanks in advance for any 
> > > > > > pointers.
> > > > > from the stack trace the process is not hanging, it is trying to 
> > > > > recover.
> > > > > After a crash/kill  the changelog does not contai a RUV and it is
> > > > > reconstructed by reading all records in the changelog, if this is 
> > > > > large it
> > > > > can take some time.
> > > > > If you look at that part of the stack repeatedly,
> > > > > 
> > > > > #4  0x00007f4e88daeba5 in cl5DBData2Entry (data=<optimized out>, 
> > > > > len=<optimized out>, entry=entry@entry=0x7ffff6598910) at 
> > > > > ldap/servers/plugins/replication/cl5_api.c:2342
> > > > >           rc = <optimized out>
> > > > >           version = <optimized out>
> > > > >           pos = 0x7f4e9839d091 ""
> > > > >           strCSN = 0x0
> > > > >           op = 0x7ffff6598980
> > > > >           add_mods = 0x7f4e983a5e80
> > > > >           rawDN = 0x7f4e98396e20 
> > > > > "fqdn=cpn-k08-29-02.cbls.ccr.buffalo.edu,cn=computers,cn=accounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu"
> > > > >           s = 
> > > > > "\300\037>\230N\177\000\000@\210Y\366\377\177\000\000@\210Y\366\377"
> > > > > #5  0x00007f4e88daf5d6 in _cl5GetNextEntry 
> > > > > (entry=entry@entry=0x7ffff6598910, iterator=0x7f4e983a5e80) at 
> > > > > ldap/servers/plugins/replication/cl5_api.c:5291
> > > > >           rc = 0
> > > > >           it = 0x7f4e983a5e80
> > > > >           key = {data = 0x0, size = 21, ulen = 0, dlen = 0, doff = 0, 
> > > > > app_data = 0x0, flags = 16}
> > > > >           data = {data = 0x7f4e9839cff0, size = 335, ulen = 0, dlen = 
> > > > > 0, doff = 0, app_data = 0x0, flags = 16}
> > > > > #6  0x00007f4e88dafb34 in _cl5ConstructRUV (purge=1, 
> > > > > obj=0x7f4e983e1fc0, replGen=0x7ffff6598910 "\200\211Y\366\377\177") 
> > > > > at ldap/servers/plugins/replication/cl5_api.c:4306
> > > > > 
> > > > > 
> > > > > you should see some progress in which entry is handled
> > > > > 
> > > > Ludwig, thanks very much for the help. As you pointed out just needed 
> > > > to let it
> > > > finish.  ns-slapd eventually came back up once it finished reading the
> > > > changelog. Still seeing some errors related to the 
> > > > NSMMReplicationPlugin failed
> > > > to apply update and from the managed-entries-plugin. Can these safely be
> > > > ignored or are they indicative of a more serious problem?
> > > This is difficult to say the reason of managed entries messages.
> > > It says that the origin entry "uid=janedoe,cn=deleted
> > > users,cn=accounts,cn=provisioning,dc=cbls,dc=ccr,dc=buffalo,dc=edu"
> > > is still having a managed entry ('|mepManagedEntry') that is possibly
> > > something like
> > > '|cn=janedoe,cn=groups,cn=accounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu".
> > > 
> > > This is looking like a bug because user 'janedoe' being a preserved user, 
> > > it
> > > should not have any reference to existing groups.
> > > 
> > > Could you dump uid=janedoe entry:
> > > ldapsearch -D "cn=directory manager" -w xxxx -b ""uid=janedoe,cn=deleted
> > > users,cn=accounts,cn=provisioning,dc=cbls,dc=ccr,dc=buffalo,dc=edu"
> > > nscpentrywsi
> nscpentrywsi is a specific attribute that dumps the entry. It is only
> available for 'cn=directory manager' but not for 'admin'.
> If you do not know the 'cn=directory manager' password, then being 'admin'
> do the same request without specifying any attributes

Sorry about that.. here's the dump of janedoe entry using the directory
manager:


ldapsearch -D "cn=directory manager" -W -b "uid=janedoe,cn=deleted 
users,cn=accounts,cn=provisioning,dc=cbls,dc=ccr,dc=buffalo,dc=edu" 
nscpentrywsi 
Enter LDAP Password: 
# extended LDIF
#
# LDAPv3
# base <uid=janedoe,cn=deleted 
users,cn=accounts,cn=provisioning,dc=cbls,dc=ccr,dc=buffalo,dc=edu> with scope 
subtree
# filter: (objectclass=*)
# requesting: nscpentrywsi 
#
# janedoe, deleted users, accounts, provisioning, cbls.ccr.buffalo.edu
dn: uid=janedoe,cn=deleted users,cn=accounts,cn=provisioning,dc=cbls,dc=ccr,d
 c=buffalo,dc=edu
nscpentrywsi: dn: uid=janedoe,cn=deleted users,cn=accounts,cn=provisioning,dc
 =cbls,dc=ccr,dc=buffalo,dc=edu
nscpentrywsi: entryusn;adcsn-58077599000100060003;vucsn-58077599000100060003: 
 114339992
nscpentrywsi: modifyTimestamp;adcsn-58077599000100060002;vucsn-580775990001000
 60002: 20161019132917Z
nscpentrywsi: modifiersName;adcsn-58077599000100060001;vucsn-58077599000100060
 001: cn=IPA MODRDN,cn=plugins,cn=config
nscpentrywsi: krbPrincipalName;adcsn-58077599000100060000;vucsn-58077599000100
 060000: jane...@cbls.ccr.buffalo.edu
nscpentrywsi: uid;vucsn-58065f7c000a00040001;mdcsn-58065f7c000a00040000: abhin
 avv
nscpentrywsi: nsAccountLock;adcsn-575f121e000600040000;vucsn-575f121e000600040
 000: TRUE
nscpentrywsi: entryid: 2585
nscpentrywsi: ipaUniqueID;vucsn-55a9d0ae000200040000: 4eea383c-2d02-11e5-9809-
 a0369f577818
nscpentrywsi: createTimestamp;vucsn-55a9d0ae000200040000: 20150718040603Z
nscpentrywsi: creatorsName;vucsn-55a9d0ae000200040000: uid=admin,cn=users,cn=a
 ccounts,dc=cbls,dc=ccr,dc=buffalo,dc=edu
nscpentrywsi: givenName;vucsn-55a9d0ae000200040000: Jane
nscpentrywsi: mail;vucsn-55a9d0ae000200040000: janedoe
nscpentrywsi: homeDirectory;vucsn-55a9d0ae000200040000: /user/janedoe
nscpentrywsi: gecos;vucsn-55a9d0ae000200040000: Jane Doe
nscpentrywsi: gidNumber;vucsn-55a9d0ae000200040000: 573
nscpentrywsi: initials;vucsn-55a9d0ae000200040000: JD
nscpentrywsi: uidNumber;vucsn-55a9d0ae000200040000: 253568
nscpentrywsi: sn;vucsn-55a9d0ae000200040000: Vishnu
nscpentrywsi: loginShell;vucsn-55a9d0ae000200040000: /bin/bash
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: ipaobject
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: person
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: top
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: ipasshuser
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: inetorgperson
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: organizationalperson
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: krbticketpolicyaux
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: krbprincipalaux
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: inetuser
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: posixaccount
nscpentrywsi: objectClass;vucsn-55a9d0ae000200040000: ipaSshGroupOfPubKeys
nscpentrywsi: cn;vucsn-55a9d0ae000200040000: Jane Doe
nscpentrywsi: displayName;vucsn-55a9d0ae000200040000: Jane Doe
nscpentrywsi: nsUniqueId: 4080421a-2d0211e5-ac0b8f7e-e0b1a377
nscpentrywsi: parentid: 8938
nscpentrywsi: memberOf;adcsn-58077599000000060000;vdcsn-58077599000000060000;d
 eletedattribute;deleted:
nscpentrywsi: description;adcsn-55a9d0ae000500040000;vdcsn-55a9d0ae00050004000
 0;deletedattribute;deleted:

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1


-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to