Hi Mahadevan,

   I think you hit a new bug that I was able to reproduce.

   The problem is an incorrect handling of operation return code when
   there is an error in SLAPI_PLUGIN_BE_TXN_POST_ADD_FN.
   When the operation succeeds in the DB, DS (as a master) updates the
   CL and RUV in SLAPI_PLUGIN_BE_TXN_POST_ADD_FN phase.
   At this moment the operation is successful. However the update of
   CL/RUV may fail (your case). It correctly aborts the update of the
   CL AND the txn related to the operation in the DB (like if the
   operation never happened)... but it looks like there is a bug where
   it keeps the operation returned code.
   The operation return code should be a failure, but it is successful.

   I reproduced the problem with a debug version where I simulate a
   DB_DEADLOCK failure

       [17/Jun/2013:11:07:21 +0200] - slapd started. Listening on All
       Interfaces port 24502 for LDAP requests
       [17/Jun/2013:11:08:23 +0200] NSMMReplicationPlugin - changelog
       program - _cl5WriteOperationTxn: failed to write entry with csn
       (51bed207000000010000); db error - -30993 BDB0068
       DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
       [17/Jun/2013:11:08:23 +0200] NSMMReplicationPlugin -
       write_changelog_and_ruv: can't add a change for uid=td,dc=com
       (uniqid: 74589a01-d72d11e2-b4c2bdbf-b0a22af5, optype: 16) to
       changelog csn 51bed207000000010000


       [tbordaz@pctbordaz userRoot]$ tail -20
       /var/log/dirsrv/slapd-master_cl/access
       ...
       [17/Jun/2013:11:08:23 +0200] conn=2 op=1 ADD dn="uid=td,dc=com"
       [17/Jun/2013:11:08:23 +0200] conn=2 op=1 RESULT err=0 tag=105
       nentries=0 etime=0 csn=51bed207000000010000


   I do not know why your environment is prone to trigger db_deadlock
   (lot of replica agreements, VM, slow disks...).
   I think the best way to progress is that you fill a ticket/bug so
   that we may track the issue. Note this bug is possibly affecting all
   operations (ADD/MOD/MODRDN/DEL)

best regards
thierry
On 06/15/2013 01:03 AM, Mahadevan, Venkat wrote:

> If the operation fails to write into the changelog, the operation fails. In your case, it means that the ldapclient should receive an error.
> So it is like if the operation never happened and is not replicated.

Hi Thierry,

That’s what I would expect to, but that does not seem to be the case. In my access log on the master server below,

the client successfully adds the entry to the master server and receives an error code return of 0:

[12/Jun/2013:13:36:59 -0700] conn=19048 op=0 BIND dn="cn=Directory Manager" method=128 version=3

[12/Jun/2013:13:36:59 -0700] conn=19048 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager"

[12/Jun/2013:13:36:59 -0700] conn=19048 op=1 ADD dn="uid=jmeter325,dc=tst,dc=id,dc=ubc,dc=ca"

[12/Jun/2013:13:37:04 -0700] conn=19048 op=1 RESULT err=0 tag=105 nentries=0 etime=5 csn=51b8dbec002b02bd0000

In the error log at the same time the above happens:

[12/Jun/2013:13:37:02 -0700] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica dc=tst,dc=id,dc=ubc,dc=ca: LDAP error - 51

[12/Jun/2013:13:37:04 -0700] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=51b8dbec002b02bd0000) failed (rc=-30994 (DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))

[12/Jun/2013:13:37:04 -0700] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (51b8dbec002b02bd0000); db error - -30994 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

[12/Jun/2013:13:37:04 -0700] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for uid=jmeter325,dc=tst,dc=id,dc=ubc,dc=ca (uniqid: c0a7827c-d39f11e2-879add5e-44f44922, optype: 16) to changelog csn 51b8dbec002b02bd0000

The net result is that the entry is successfully added to the master server but will never replication to any of the consumers.

cheers,

VM


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to