[389-users] DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

Kees Bakker Wed, 28 Jul 2021 09:11:38 -0700

Hi,

This is in a IPA deployment. We have three masters/replicas in a triangular 
topology, A-B, B-C, C-A.
The systems are called: rotte, linge and iparep4.


rotte is CentOS 7, with 389-ds-base-1.3.9.1-13.el7_7.x86_64
linge and iparep4 are CentOS 8 Stream, with 
389-ds-base-1.4.3.23-2.module_el8.5.0+835+5d54734c.x86_64

Yesterday I removed some members from a user group on rotte. This caused the 
follow errors
on linge (and on iparep4).

Jul 26 11:44:37 linge.example.com ns-slapd[282944]: 
[26/Jul/2021:11:44:37.947738548 +0200] - ERR - NSMMReplicationPlugin - 
changelog program - _cl5WriteOperationTxn - retry (49) the transaction 
(csn=60fe8535001000030000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker 
killed to resolve a deadlock))
Jul 26 11:44:38 linge.example.com ns-slapd[282944]: 
[26/Jul/2021:11:44:38.000964611 +0200] - ERR - NSMMReplicationPlugin - 
changelog program - _cl5WriteOperationTxn - Failed to write entry with csn 
(60fe8535001000030000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker 
killed to resolve a deadlock
Jul 26 11:44:38 linge.example.com ns-slapd[282944]: 
[26/Jul/2021:11:44:38.025996273 +0200] - ERR - NSMMReplicationPlugin - 
write_changelog_and_ruv - Can't add a change for 
cn=vpn_users,cn=groups,cn=accounts,dc=example,dc=com (uniqid: 
31283c01-a16511e9-93cf90e8-ab7c8ee8, optype: 8) to changelog csn 
60fe8535001000030000
Jul 26 11:44:38 linge.example.com ns-slapd[282944]: 
[26/Jul/2021:11:44:38.062640602 +0200] - ERR - NSMMReplicationPlugin - 
process_postop - Failed to apply update (60fe8535001000030000) error (1).  
Aborting replication session(conn=53596 op=65)

On rotte

jul 26 11:44:39 rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:39.055890736 +0200] 
- WARN - NSMMReplicationPlugin - repl5_inc_update_from_op_result - 
agmt="cn=meTolinge.example.com" (linge:389): Consumer failed to replay change 
(uniqueid 31283c01-a16511e9-93cf90e8-ab7c8ee8, CSN 60fe8535001000030000): Operations 
error (1). Will retry later.
jul 26 11:44:39 rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:39.058198988 +0200] 
- WARN - NSMMReplicationPlugin - repl5_inc_update_from_op_result - 
agmt="cn=meTolinge.example.com" (linge:389): Consumer failed to replay change 
(uniqueid 31283c01-a16511e9-93cf90e8-ab7c8ee8, CSN 60fe8535003300030000): Operations 
error(1). Will retry later.
jul 26 11:44:39 rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:39.069825407 +0200] 
- ERR - NSMMReplicationPlugin - release_replica - 
agmt="cn=meTolinge.example.com" (linge:389): Unable to send endReplication 
extended operation (Operations error)
jul 26 11:44:46 rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:46.561562313 +0200] 
- INFO - NSMMReplicationPlugin - bind_and_check_pwp - 
agmt="cn=meTolinge.example.com" (linge:389): Replication bind with GSSAPI auth 
resumed

As far as I can see the user group is correctly modified on all replicas. But 
it doesn't
look healthy to me.

Is there anything I can do to see what went wrong? Is there something to improve
in the configuration?
--
Kees
_______________________________________________
389-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

[389-users] DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

Reply via email to