On 05/06/2015 07:41 PM, thierry bordaz wrote:
On 05/06/2015 05:56 PM, Martin Babinsky wrote:
On 05/06/2015 04:25 PM, thierry bordaz wrote:
On 05/06/2015 03:19 PM, Martin Babinsky wrote:
Hello Thierry,

replies are inline.

On 05/06/2015 02:22 PM, thierry bordaz wrote:
On 05/06/2015 01:54 PM, Martin Babinsky wrote:
The attached patch tries to fix
https://fedorahosted.org/freeipa/ticket/4378

After discussion with Thierry we concluded that while this issue is
more complex than it seems, the transition from REPLACE to DEL/ADD
operations when updating nsDS5ReplicaId should suffice for this
ticket.

Hello Martin,

Few comments, you are using MOD_DEL 'replicaID' with None value. So
this
is going to delete all previous values and it should be equivalent
to a
MOD_REPL.
I was thinking you wanted to retrieve the id_value and call MOD_DEL
'replicaID' <current_value>. So that if by the time you fetched the
replicaId, an other replica updated the replicaId, the MOD_DEL/MOD_ADD
would fail and you need a new iteration.

Sorry I didn't know you can MOD_DEL a particular value (I'm LDAP noob
I know). Will fix this.
If replicaId was multi-valued and you want to make it single
valued, you
may want to do create a more complex MOD (e.g. (ldap.MOD_DELETE,
'nsDS5ReplicaId', str(value1), (ldap.MOD_DELETE, 'nsDS5ReplicaId',
str(value2)...)

AFAIK ReplicaId is single-valued (looking at the schema right now) so
this shouldn't be problem.
If it is updating successfully do you want to return 'retval' or
'retval+1' ?

If several replicas try to update the replicaId of the master and the
current replicaId is 1000.
Replica1 successfully updates the replicaId and gets 1001 as the new
value.
Replica2 successfully updates the replicaId and gets 1002.
The final value on master will be 1002, but replica1 will assum it is
1001. Is it a problem ?

I studied the code in the master branch and IIUC (and please correct
me if I got this wrong) nsDS5ReplicaId attribute in
'cn=replication,cn=etc,$SUFFIX' represents replicaID of the _next_
replica that will be installed.

So if a replica is installed, it sets the current value of
nsDS5ReplicaId as its replica ID (the function returns 'retval') and
then increments it in 'cn=replication,cn=etc,$SUFFIX' entry ('retval +
1' is written to master) so that the next installed replica fetches
this updated value.

So the case you described should be the expected behavior. To change
it would require different patch IMHO.

Thank for your precious explanations, in fact the value in
'cn=replication,cn=etc,SUFFIX' is the next available replicaId value and
is the centralized mechanism that assign unique replicaID.

The risk was that several replica pick the same value. So yes it is
important that the MOD_DEL specifies the previously read value so that
the test/set will be atomic. If several replicas read the same value,
only the faster one will use it to install the replica.

thanks
thierry
thanks
thierry




Attaching updated patch with fixed MOD_DELETE operation.


Hi Martin,

The fix looks good to me except I think you need to do (ldap.MOD_DELETE,
'nsDS5ReplicaId', *str(*retval*)*)

thanks
thierry

Attaching updated patch.

--
Martin^3 Babinsky
From 80f3bca039ada9c2b99d74b8e6022d48f46af0fb Mon Sep 17 00:00:00 2001
From: Martin Babinsky <mbabi...@redhat.com>
Date: Mon, 4 May 2015 18:33:44 +0200
Subject: [PATCH] prevent duplicate IDs when setting up multiple replicas
 against single master

This patch forces replicas to use DELETE+ADD operations to increment
'nsDS5ReplicaId' in 'cn=replication,cn=etc,$SUFFIX' on master, and retry
multiple times in the case of conflict with another update. Thus when multiple
replicas are set-up against single master none of them will have duplicate ID.

https://fedorahosted.org/freeipa/ticket/4378
---
 ipaserver/install/replication.py | 78 ++++++++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 26 deletions(-)

diff --git a/ipaserver/install/replication.py b/ipaserver/install/replication.py
index 66764c22f69328942fe2e4581cfafb3806438d7c..4c16dc22507237945188f01d51a63a391b872b18 100644
--- a/ipaserver/install/replication.py
+++ b/ipaserver/install/replication.py
@@ -21,6 +21,7 @@ import time
 import datetime
 import sys
 import os
+from random import randint
 
 import ldap
 
@@ -230,34 +231,59 @@ class ReplicationManager(object):
 
         # Ok, either the entry doesn't exist or the attribute isn't set
         # so get it from the other master
-        retval = -1
+        return self._get_and_update_id_from_master(master_conn)
+
+    def _get_and_update_id_from_master(self, master_conn, attempts=5):
+        """
+        Fetch replica ID from remote master and update nsDS5ReplicaId attribute
+        on 'cn=replication,cn=etc,$SUFFIX' entry. Do it as MOD_DELETE+MOD_ADD
+        operations and retry when conflict occurs, e.g. due to simultaneous
+        update from another replica.
+        :param master_conn: LDAP connection to master
+        :param attempts: number of attempts to update nsDS5ReplicaId
+        :return: value of nsDS5ReplicaId before incrementation
+        """
         dn = DN(('cn','replication'),('cn','etc'), self.suffix)
-        try:
-            replica = master_conn.get_entry(dn)
-        except errors.NotFound:
-            root_logger.debug("Unable to retrieve nsDS5ReplicaId from remote server")
-            raise
-        else:
-            id_values = replica.get('nsDS5ReplicaId')
-            if not id_values:
+
+        for a in range(1, attempts + 1):
+            try:
+                root_logger.debug('Fetching nsDS5ReplicaId from master '
+                                  '[attempt %d/%d]', a, attempts)
+                replica = master_conn.get_entry(dn)
+                id_values = replica.get('nsDS5ReplicaId')
+                if not id_values:
+                    root_logger.debug("Unable to retrieve nsDS5ReplicaId from remote server")
+                    raise RuntimeError("Unable to retrieve nsDS5ReplicaId from remote server")
+                # nsDS5ReplicaId is single-valued now, but historically it could
+                # contain multiple values, of which we need the highest.
+                # see bug: https://fedorahosted.org/freeipa/ticket/3394
+                retval = max(int(v) for v in id_values)
+
+                # Now update the value on the master
+                mod_list = [(ldap.MOD_DELETE, 'nsDS5ReplicaId', str(retval)),
+                            (ldap.MOD_ADD, 'nsDS5ReplicaId', str(retval + 1))]
+
+                master_conn.modify_s(dn, mod_list)
+                root_logger.debug('Successfully updated nsDS5ReplicaId.')
+                return retval
+
+            except errors.NotFound:
                 root_logger.debug("Unable to retrieve nsDS5ReplicaId from remote server")
-                raise RuntimeError("Unable to retrieve nsDS5ReplicaId from remote server")
-
-        # nsDS5ReplicaId is single-valued now, but historically it could
-        # contain multiple values, of which we need the highest.
-        # see bug: https://fedorahosted.org/freeipa/ticket/3394
-        retval = max(int(v) for v in id_values)
-
-        # Now update the value on the master
-        mod = [(ldap.MOD_REPLACE, 'nsDS5ReplicaId', str(retval + 1))]
-
-        try:
-            master_conn.modify_s(dn, mod)
-        except Exception, e:
-            root_logger.debug("Problem updating nsDS5ReplicaID %s" % e)
-            raise
-
-        return retval
+                raise
+            # these errors signal a conflict in updating replica ID.
+            # We then wait for a random time interval and try again
+            except (ldap.NO_SUCH_ATTRIBUTE, ldap.OBJECT_CLASS_VIOLATION) as e:
+                sleep_interval = randint(1, 5)
+                root_logger.debug("Update failed (%s). Conflicting operation?",
+                                  e)
+                time.sleep(sleep_interval)
+            # in case of other error we bail out
+            except ldap.LDAPError as e:
+                root_logger.debug("Problem updating nsDS5ReplicaID %s" % e)
+                raise
+
+        raise RuntimeError("Failed to update nsDS5ReplicaId in %d attempts"
+                           % attempts)
 
     def get_agreement_filter(self, agreement_types=None, host=None):
         """
-- 
2.1.0

-- 
Manage your subscription for the Freeipa-devel mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-devel
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code

Reply via email to