On 11/12/2014 10:55 AM, Ivanov Andrey (M.) wrote:





------------------------------------------------------------------------

    *De: *"Ivanov Andrey (M.)" <[email protected]>
    *À: *"General discussion list for the 389 Directory server
    project." <[email protected]>
    *Envoyé: *Mercredi 12 Novembre 2014 18:52:44
    *Objet: *[389-users] 389ds v.1.3.2.24 replication deadlocks/retry
    count        exceeded

    Hi,

    I've continued testing 389ds v.1.3.2.24 on CentOS 7. I really have
    an impression that everything works fine (plugins etc) but the
    replication seems to be a little fragile. Both of the tickets i've
    already opened concern replication partially or completely
    (https://fedorahosted.org/389/ticket/47942 and
    https://fedorahosted.org/389/ticket/47950).

    Here is another issue with replication :
    i have two servers with multi-master agreements on each of them
    (the same configuration as in ticket
    https://fedorahosted.org/389/ticket/47942).

    We add/delete a lot of groups (943, to be exact). Each group may
    contain a large number of referenced entries, up to ~250
    (uniqueMember: dn). MemberOf plugin is activated and works fine.
    Referential integrity plugin is also activated but of course it is
    of any sense only when deleting groups (or renaming them). It goes
    on for a long time (20-30 minutes or more). Some time after the
    beginning of the operations (typically 5-8 minutes) we have
    replication erros and inconsistency of the replica concerning the
    entries mentioned in error log.

    When adding and deleting groups the supplier is ok. Howevere the
    consumer has several (from one to four or five) groupe
    deletions/adds that are not replicated. The error on the supplier:

    [12/Nov/2014:16:46:42 +0100] NSMMReplicationPlugin -
    agmt="cn=Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
    replay change (uniqueid fa90219d-6a8211e4-a42c901a-94623bee, CSN
    546380d6000000020000): Operations error (1). Will retry later.
    [12/Nov/2014:16:47:55 +0100] NSMMReplicationPlugin -
    agmt="cn=Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
    replay change (uniqueid 1e5367ae-6a8311e4-a42c901a-94623bee, CSN
    54638125000000020000): Operations error (1). Will retry later.
    [12/Nov/2014:16:53:14 +0100] NSMMReplicationPlugin -
    agmt="cn=Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
    replay change (uniqueid f4e70b85-6a8311e4-a42c901a-94623bee, CSN
    54638260000000020000): Operations error (1). Will retry later.
    [12/Nov/2014:16:55:12 +0100] NSMMReplicationPlugin -
    agmt="cn=Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
    replay change (uniqueid 3c6d978a-6a8411e4-a42c901a-94623bee, CSN
    546382d6000400020000): Operations error (1). Will retry later.
    [12/Nov/2014:16:56:31 +0100] NSMMReplicationPlugin -
    agmt="cn=Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
    replay change (uniqueid 6030dd93-6a8411e4-a42c901a-94623bee, CSN
    54638325000000020000): Operations error (1). Will retry later.
    [12/Nov/2014:16:57:22 +0100] NSMMReplicationPlugin -
    agmt="cn=Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
    replay change (uniqueid 83f42395-6a8411e4-a42c901a-94623bee, CSN
    5463835d000000020000): Operations error (1). Will retry later.

    The corresponding errors on the consumer seem to hint deadlocks in
    these cases:
    [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: retry (49) the transaction
    (csn=546380d6000000020000) failed (rc=-30993 (BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
    [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: failed to write entry with csn
    (546380d6000000020000); db error - -30993 BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin -
    write_changelog_and_ruv: can't add a change for
    
cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
    (uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16) to
    changelog csn 546380d6000000020000
    [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: retry (49) the transaction
    (csn=54638125000000020000) failed (rc=-30993 (BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
    [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: failed to write entry with csn
    (54638125000000020000); db error - -30993 BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin -
    write_changelog_and_ruv: can't add a change for
    
cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
    (uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16) to
    changelog csn 54638125000000020000
    [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: retry (49) the transaction
    (csn=54638260000000020000) failed (rc=-30993 (BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
    [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: failed to write entry with csn
    (54638260000000020000); db error - -30993 BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin -
    write_changelog_and_ruv: can't add a change for
    
cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
    (uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16) to
    changelog csn 54638260000000020000
    [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: retry (49) the transaction
    (csn=546382d6000400020000) failed (rc=-30993 (BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
    [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: failed to write entry with csn
    (546382d6000400020000); db error - -30993 BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin -
    write_changelog_and_ruv: can't add a change for
    
cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
    (uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16) to
    changelog csn 546382d6000400020000
    [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: retry (49) the transaction
    (csn=54638325000000020000) failed (rc=-30993 (BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
    [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: failed to write entry with csn
    (54638325000000020000); db error - -30993 BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin -
    write_changelog_and_ruv: can't add a change for
    
cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
    (uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16) to
    changelog csn 54638325000000020000
    [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: retry (49) the transaction
    (csn=5463835d000000020000) failed (rc=-30993 (BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
    [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog
    program - _cl5WriteOperationTxn: failed to write entry with csn
    (5463835d000000020000); db error - -30993 BDB0068
    DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin -
    write_changelog_and_ruv: can't add a change for
    
cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
    (uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16) to
    changelog csn 5463835d000000020000


See https://bugzilla.redhat.com/show_bug.cgi?id=979169



    Each time the failed group entries are different.

    The replication agreements look liek this :
    0 cn=Replication from ldap-edev.polytechnique.fr to
    
ldap-model.polytechnique.fr,cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping
    tree,cn=config
    objectClass: top
    objectClass: nsDS5ReplicationAgreement
    cn: Replication from ldap-edev.polytechnique.fr to
    ldap-model.polytechnique.fr
    description: Replication agreement from server
    ldap-edev.polytechnique.fr to server ldap-model.polytechnique.fr
    nsDS5ReplicaHost: ldap-model.polytechnique.fr
    nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
    nsDS5ReplicaPort: 636
    nsDS5ReplicaTransportInfo: SSL
    nsDS5ReplicaBindDN: cn=RepliX,cn=config
    nsDS5ReplicaBindMethod: simple
    nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn
    memberOf
    nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
    nsds5ReplicaStripAttrs: modifiersName modifyTimestamp
    internalModifiersName internalModifyTimestamp internalCreatorsname
    nsds5replicaBusyWaitTime: 3
    nsds5replicaTimeout: 30
    nsDS5ReplicaCredentials: {DES}...
    nsds5ReplicaEnabled: on
    nsds50ruv: {replicageneration} 54636df3000000020000
    nsds50ruv: {replica 1 ldap://ldap-model.polytechnique.fr:389}
    54637b3c000000010000 54637dce000000010000
    nsds50ruv: {replica 2 ldap://ldap-edev.polytechnique.fr:389}
    54636ffe000000020000 54637d2d000000020000
    nsruvReplicaLastModified: {replica 1
    ldap://ldap-model.polytechnique.fr:389} 00000000
    nsruvReplicaLastModified: {replica 2
    ldap://ldap-edev.polytechnique.fr:389} 00000000
    nsds5replicareapactive: 0
    nsds5replicaLastUpdateStart: 20141112173922Z
    nsds5replicaLastUpdateEnd: 20141112173922Z
    nsds5replicaChangesSentSinceStartup: 2:1899/21484
    nsds5replicaLastUpdateStatus: 0 Replica acquired successfully:
    Incremental update succeeded
    nsds5replicaUpdateInProgress: FALSE
    nsds5replicaLastInitStart: 0
    nsds5replicaLastInitEnd: 0


    The DNA plug-in is disabled (excluding
    https://fedorahosted.org/389/ticket/47410)


    I've made a ticket concerning  this problem
    (https://fedorahosted.org/389/ticket/47410)

The ticket for this problem is of course https://fedorahosted.org/389/ticket/47954, not the one i've written in the original mail.






    Another problem when doing these tests, with replication
    agreements suspended is the following message in the error log:

    12/Nov/2014:16:15:35 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:38 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:40 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:43 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:45 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:47 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:50 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:52 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:55 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:57 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:15:59 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:02 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:04 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:06 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:09 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:11 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:14 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:17 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:19 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:22 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:16:24 +0100] - Retry count exceeded in modify
    ...
    (it goes on for approx 5 minutes)
    [12/Nov/2014:16:20:31 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:20:33 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:20:36 +0100] - Retry count exceeded in modify
    [12/Nov/2014:16:20:42 +0100] - Retry count exceeded in modify


    During these error messages the server doe snot consume CPU but it
    is does not allow any changes to continue. Do i need to open
    another ticket for this second problem?

    Thank you!


    Regards,
    Andrey

    --
    389 users mailing list
    [email protected]
    https://admin.fedoraproject.org/mailman/listinfo/389-users




--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to