Re: [Freeipa-devel] [PATCH] 1031 run cleanallruv task

Martin Kosek Thu, 06 Sep 2012 11:24:34 -0700

On Thu, 2012-09-06 at 12:42 -0400, Mark Reynolds wrote:
> 
> On 09/06/2012 12:27 PM, Martin Kosek wrote:
> > On 09/06/2012 06:13 PM, Rich Megginson wrote:
> >> On 09/06/2012 10:09 AM, Martin Kosek wrote:
> >>> On 09/06/2012 06:09 PM, Martin Kosek wrote:
> >>>> On 09/06/2012 06:05 PM, Martin Kosek wrote:
> >>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote:
> >>>>>> Rob Crittenden wrote:
> >>>>>>> Rob Crittenden wrote:
> >>>>>>>> Martin Kosek wrote:
> >>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote:
> >>>>>>>>>> Rob Crittenden wrote:
> >>>>>>>>>>> Martin Kosek wrote:
> >>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
> >>>>>>>>>>>>> Martin Kosek wrote:
> >>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
> >>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on the
> >>>>>>>>>>>>>>> other servers.
> >>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it 
> >>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>> causes the
> >>>>>>>>>>>>>>> server to calculate changes against a server that may no 
> >>>>>>>>>>>>>>> longer
> >>>>>>>>>>>>>>> exist.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself 
> >>>>>>>>>>>>>>> to all
> >>>>>>>>>>>>>>> available
> >>>>>>>>>>>>>>> replicas to clean this RUV data.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This patch will create this task at deletion time to hopefully
> >>>>>>>>>>>>>>> clean things up.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at the
> >>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data
> >>>>>>>>>>>>>>> may be
> >>>>>>>>>>>>>>> re-propogated around.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> To make things easier in this case I've added two new 
> >>>>>>>>>>>>>>> commands to
> >>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of 
> >>>>>>>>>>>>>>> all the
> >>>>>>>>>>>>>>> servers we
> >>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the
> >>>>>>>>>>>>>>> replication id of a
> >>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step 
> >>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a
> >>>>>>>>>>>>>>> replica id that
> >>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in
> >>>>>>>>>>>>>>> enough scary
> >>>>>>>>>>>>>>> warnings about this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> rob
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier 
> >>>>>>>>>>>>>> than
> >>>>>>>>>>>>>> with the
> >>>>>>>>>>>>>> previous version.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is what I found during review:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I
> >>>>>>>>>>>>>> think
> >>>>>>>>>>>>>> it would
> >>>>>>>>>>>>>> help if we for example have all info for commands indented. 
> >>>>>>>>>>>>>> This
> >>>>>>>>>>>>>> way
> >>>>>>>>>>>>>> user could
> >>>>>>>>>>>>>> simply over-look the new commands in the man page.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to 
> >>>>>>>>>>>>>> make
> >>>>>>>>>>>>>> them
> >>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize,
> >>>>>>>>>>>>>> force-sync).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an
> >>>>>>>>>>>>>> unattended way
> >>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we already
> >>>>>>>>>>>>>> do for
> >>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in 
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react 
> >>>>>>>>>>>>>> too
> >>>>>>>>>>>>>> well for
> >>>>>>>>>>>>>> CTRL+D:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force
> >>>>>>>>>>>>>> Clean the Replication Update Vector for
> >>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
> >>>>>>>>>>>>>> longer replicate so it may miss updates while the process
> >>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
> >>>>>>>>>>>>>> consistency. Be very careful.
> >>>>>>>>>>>>>> Continue to clean? [no]: unexpected error:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is 
> >>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>> confusing
> >>>>>>>>>>>>>> as it reports that command is wrong and not the parameter:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> # ipa-replica-manage clean_ruv
> >>>>>>>>>>>>>> Usage: ipa-replica-manage [options]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv |
> >>>>>>>>>>>>>> force-sync |
> >>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It seems you just forgot to specify the error message in the
> >>>>>>>>>>>>>> command
> >>>>>>>>>>>>>> definition
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command fails
> >>>>>>>>>>>>>> with an
> >>>>>>>>>>>>>> unexpected error:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [root@vm-086 ~]# ipa-replica-manage clean_ruv 5
> >>>>>>>>>>>>>> Clean the Replication Update Vector for
> >>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
> >>>>>>>>>>>>>> longer replicate so it may miss updates while the process
> >>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
> >>>>>>>>>>>>>> consistency. Be very careful.
> >>>>>>>>>>>>>> Continue to clean? [no]: y
> >>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'}
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
> >>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
> >>>>>>>>>>>>>> cleanAllRUV_task: failed
> >>>>>>>>>>>>>> to connect to repl        agreement connection
> >>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> tree,cn=config), error 105
> >>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
> >>>>>>>>>>>>>> cleanAllRUV_task: replica
> >>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.
> >>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> tree,   cn=config) has not been cleaned.  You will need to 
> >>>>>>>>>>>>>> rerun
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> CLEANALLRUV task on this replica.
> >>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
> >>>>>>>>>>>>>> cleanAllRUV_task: Task
> >>>>>>>>>>>>>> failed (1)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In this case I think we should inform user that the command 
> >>>>>>>>>>>>>> failed,
> >>>>>>>>>>>>>> possibly
> >>>>>>>>>>>>>> because of disconnected replicas and that they could enable the
> >>>>>>>>>>>>>> replicas and
> >>>>>>>>>>>>>> try again.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py:
> >>>>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
> >>>>>>>>>>>>>> +            # We can't make the server we're removing 
> >>>>>>>>>>>>>> read-only
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>> +            # this isn't a show-stopper
> >>>>>>>>>>>>>> +            root_logger.debug("No permission to switch 
> >>>>>>>>>>>>>> replica to
> >>>>>>>>>>>>>> read-only,
> >>>>>>>>>>>>>> continuing anyway")
> >>>>>>>>>>>>>> +            pass
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> I think this addresses everything.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> rob
> >>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs to 
> >>>>>>>>>>>> be
> >>>>>>>>>>>> fixed
> >>>>>>>>>>>> before we push:
> >>>>>>>>>>>>
> >>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
> >>>>>>>>>>>> Directory Manager password:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, 
> >>>>>>>>>>>> forcing
> >>>>>>>>>>>> removal
> >>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
> >>>>>>>>>>>> "Can't
> >>>>>>>>>>>> contact LDAP server"}
> >>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
> >>>>>>>>>>>>
> >>>>>>>>>>>> There were issues removing a connection: %d format: a number is
> >>>>>>>>>>>> required, not str
> >>>>>>>>>>>>
> >>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
> >>>>>>>>>>>> "Can't
> >>>>>>>>>>>> contact LDAP server"}
> >>>>>>>>>>>>
> >>>>>>>>>>>> This is a traceback I retrieved:
> >>>>>>>>>>>> Traceback (most recent call last):
> >>>>>>>>>>>>       File "/sbin/ipa-replica-manage", line 425, in del_master
> >>>>>>>>>>>>         del_link(realm, r, hostname, options.dirman_passwd, 
> >>>>>>>>>>>> force=True)
> >>>>>>>>>>>>       File "/sbin/ipa-replica-manage", line 271, in del_link
> >>>>>>>>>>>>         repl1.cleanallruv(replica_id)
> >>>>>>>>>>>>       File
> >>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
> >>>>>>>>>>>> line 1094, in cleanallruv
> >>>>>>>>>>>>         root_logger.debug("Creating CLEANALLRUV task for replica 
> >>>>>>>>>>>> id
> >>>>>>>>>>>> %d" %
> >>>>>>>>>>>> replicaId)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> The problem here is that you don't convert replica_id to int in 
> >>>>>>>>>>>> this
> >>>>>>>>>>>> part:
> >>>>>>>>>>>> +    replica_id = None
> >>>>>>>>>>>> +    if repl2:
> >>>>>>>>>>>> +        replica_id = repl2._get_replica_id(repl2.conn, None)
> >>>>>>>>>>>> +    else:
> >>>>>>>>>>>> +        servers = get_ruv(realm, replica1, dirman_passwd)
> >>>>>>>>>>>> +        for (netloc, rid) in servers:
> >>>>>>>>>>>> +            if netloc.startswith(replica2):
> >>>>>>>>>>>> +                replica_id = rid
> >>>>>>>>>>>> +                break
> >>>>>>>>>>>>
> >>>>>>>>>>>> Martin
> >>>>>>>>>>>>
> >>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should more
> >>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, and
> >>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary.
> >>>>>>>>>>>
> >>>>>>>>>>> rob
> >>>>>>>>>> Rebased patch
> >>>>>>>>>>
> >>>>>>>>>> rob
> >>>>>>>>>>
> >>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry 
> >>>>>>>>> slipped
> >>>>>>>>> elsewhere.
> >>>>>>>>>
> >>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I
> >>>>>>>>> know why
> >>>>>>>>> you have it there, but since it is generally a bad thing to do, some
> >>>>>>>>> comment
> >>>>>>>>> why it is needed would be useful.
> >>>>>>>>>
> >>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
> >>>>>>>>> dirman_passwd,
> >>>>>>>>> force=False):
> >>>>>>>>>         repl1.delete_agreement(replica2)
> >>>>>>>>>         repl1.delete_referral(replica2)
> >>>>>>>>>
> >>>>>>>>> +    if type1 == replication.IPA_REPLICA:
> >>>>>>>>> +        if repl2:
> >>>>>>>>> +            ruv = repl2._get_replica_id(repl2.conn, None)
> >>>>>>>>> +        else:
> >>>>>>>>> +            ruv = get_ruv_by_host(realm, replica1, replica2,
> >>>>>>>>> dirman_passwd)
> >>>>>>>>> +
> >>>>>>>>> +        try:
> >>>>>>>>> +            repl1.cleanallruv(ruv)
> >>>>>>>>> +        except KeyboardInterrupt:
> >>>>>>>>> +            pass
> >>>>>>>>> +
> >>>>>>>>>
> >>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again?
> >>>>>>>> No, it is there because it is safe to break out of it. The task will
> >>>>>>>> continue to run. I added some verbiage.
> >>>>>>>>
> >>>>>>>>> 2) This is related to 1), but when some replica is down,
> >>>>>>>>> "ipa-replica-manage
> >>>>>>>>> del" may wait indefinitely when some remote replica is down, right?
> >>>>>>>>>
> >>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
> >>>>>>>>> Deleting a master is irreversible.
> >>>>>>>>> To reconnect to the remote master you will need to prepare a new
> >>>>>>>>> replica file
> >>>>>>>>> and re-install.
> >>>>>>>>> Continue to delete? [no]: y
> >>>>>>>>> ipa: INFO: Setting agreement
> >>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
> >>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
> >>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> tree,cn=config
> >>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
> >>>>>>>>> acquired
> >>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
> >>>>>>>>> Background task created to clean replication data
> >>>>>>>>>
> >>>>>>>>> ... after about a minute I hit CTRL+C
> >>>>>>>>>
> >>>>>>>>> ^CDeleted replication agreement from 
> >>>>>>>>> 'vm-086.idm.lab.bos.redhat.com' to
> >>>>>>>>> 'vm-055.idm.lab.bos.redhat.com'
> >>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS 
> >>>>>>>>> record
> >>>>>>>>> does not
> >>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.'
> >>>>>>>>> You may need to manually remove them from the tree
> >>>>>>>>>
> >>>>>>>>> I think it would be better to inform user that some remote replica 
> >>>>>>>>> is
> >>>>>>>>> down or
> >>>>>>>>> at least that we are waiting for the task to complete. Something 
> >>>>>>>>> like
> >>>>>>>>> that:
> >>>>>>>>>
> >>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
> >>>>>>>>> ...
> >>>>>>>>> Background task created to clean replication data
> >>>>>>>>> Replication data clean up may take very long time if some replica is
> >>>>>>>>> unreachable
> >>>>>>>>> Hit CTRL+C to interrupt the wait
> >>>>>>>>> ^C Clean up wait interrupted
> >>>>>>>>> ....
> >>>>>>>>> [continue with del]
> >>>>>>>> Yup, did this in #1.
> >>>>>>>>
> >>>>>>>>> 3) (minor) When there is a cleanruv task running and you run
> >>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with
> >>>>>>>>> duplicate
> >>>>>>>>> task object in LDAP:
> >>>>>>>>>
> >>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
> >>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
> >>>>>>>>> removal
> >>>>>>>>> FAIL
> >>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': 
> >>>>>>>>> "Can't
> >>>>>>>>> contact LDAP server"}
> >>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
> >>>>>>>>>
> >>>>>>>>> There were issues removing a connection: This entry already exists
> >>>>>>>>> <<<<<<<<<
> >>>>>>>>>
> >>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': 
> >>>>>>>>> "Can't
> >>>>>>>>> contact LDAP server"}
> >>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS 
> >>>>>>>>> record
> >>>>>>>>> does not
> >>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
> >>>>>>>>> You may need to manually remove them from the tree
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I think it should be enough to just catch for "entry already 
> >>>>>>>>> exists" in
> >>>>>>>>> cleanallruv function, and in such case print a relevant error 
> >>>>>>>>> message
> >>>>>>>>> bail out.
> >>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too.
> >>>>>>>> Good catch, fixed.
> >>>>>>>>
> >>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass"
> >>>>>>>>> statement:
> >>>>>>>>>
> >>>>>>>>> +    def make_readonly(self):
> >>>>>>>>> +        """
> >>>>>>>>> +        Make the current replication agreement read-only.
> >>>>>>>>> +        """
> >>>>>>>>> +        dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
> >>>>>>>>> +                ('cn', 'plugins'), ('cn', 'config'))
> >>>>>>>>> +
> >>>>>>>>> +        mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
> >>>>>>>>> +        try:
> >>>>>>>>> +            self.conn.modify_s(dn, mod)
> >>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
> >>>>>>>>> +            # We can't make the server we're removing read-only but
> >>>>>>>>> +            # this isn't a show-stopper
> >>>>>>>>> +            root_logger.debug("No permission to switch replica to
> >>>>>>>>> read-only,
> >>>>>>>>> continuing anyway")
> >>>>>>>>> +            pass<<<<<<<<<<<<<<<
> >>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially, 
> >>>>>>>> then
> >>>>>>>> add logging in front of it and forget to delete the pass. Its gone 
> >>>>>>>> now.
> >>>>>>>>
> >>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the
> >>>>>>>>> user_input
> >>>>>>>>> would be helpful (at least for test automation):
> >>>>>>>>>
> >>>>>>>>> +    if not ipautil.user_input("Continue to clean?", False):
> >>>>>>>>> +        sys.exit("Aborted")
> >>>>>>>> Yup, added.
> >>>>>>>>
> >>>>>>>> rob
> >>>>>>> Slightly revised patch. I still had a window open with one unsaved 
> >>>>>>> change.
> >>>>>>>
> >>>>>>> rob
> >>>>>>>
> >>>>>> Apparently there were two unsaved changes, one of which was lost. This
> >>>>>> adds in
> >>>>>> the 'entry already exists' fix.
> >>>>>>
> >>>>>> rob
> >>>>>>
> >>>>> Just one last thing (otherwise the patch is OK) - I don't think this is
> >>>>> what we
> >>>>> want :-)
> >>>>>
> >>>>> # ipa-replica-manage clean-ruv 8
> >>>>> Clean the Replication Update Vector for 
> >>>>> vm-055.idm.lab.bos.redhat.com:389
> >>>>>
> >>>>> Cleaning the wrong replica ID will cause that server to no
> >>>>> longer replicate so it may miss updates while the process
> >>>>> is running. It would need to be re-initialized to maintain
> >>>>> consistency. Be very careful.
> >>>>> Continue to clean? [no]: y<<<<<<
> >>>>> Aborted
> >>>>>
> >>>>>
> >>>>> Nor this exception, (your are checking for wrong exception):
> >>>>>
> >>>>> # ipa-replica-manage clean-ruv 8
> >>>>> Clean the Replication Update Vector for 
> >>>>> vm-055.idm.lab.bos.redhat.com:389
> >>>>>
> >>>>> Cleaning the wrong replica ID will cause that server to no
> >>>>> longer replicate so it may miss updates while the process
> >>>>> is running. It would need to be re-initialized to maintain
> >>>>> consistency. Be very careful.
> >>>>> Continue to clean? [no]:
> >>>>> unexpected error: This entry already exists
> >>>>>
> >>>>> This is the exception:
> >>>>>
> >>>>> Traceback (most recent call last):
> >>>>>     File "/sbin/ipa-replica-manage", line 651, in<module>
> >>>>>       main()
> >>>>>     File "/sbin/ipa-replica-manage", line 648, in main
> >>>>>       clean_ruv(realm, args[1], options)
> >>>>>     File "/sbin/ipa-replica-manage", line 373, in clean_ruv
> >>>>>       thisrepl.cleanallruv(ruv)
> >>>>>     File 
> >>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
> >>>>> line 1136, in cleanallruv
> >>>>>       self.conn.addEntry(e)
> >>>>>     File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 
> >>>>> 503, in
> >>>>> addEntry
> >>>>>       self.__handle_errors(e, arg_desc=arg_desc)
> >>>>>     File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 
> >>>>> 321, in
> >>>>> __handle_errors
> >>>>>       raise errors.DuplicateEntry()
> >>>>> ipalib.errors.DuplicateEntry: This entry already exists
> >>>>>
> >>>>> Martin
> >>>>>
> >>>> On another matter, I just noticed that CLEANRUV is not proceeding if I 
> >>>> have a
> >>>> winsync replica defined (and it is even up):
> >>>>
> >>>> # ipa-replica-manage list
> >>>> dc.ad.test: winsync<<<<<<<
> >>>> vm-072.idm.lab.bos.redhat.com: master
> >>>> vm-086.idm.lab.bos.redhat.com: master
> >>>>
> >>>> [06/Sep/2012:11:59:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Waiting
> >>>> for all the replicas to receive all the deleted replica updates...
> >>>> [06/Sep/2012:11:59:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Failed
> >>>> to contact agmt (agmt="cn=meTodc.ad.test" (dc:389)) error (10), will 
> >>>> retry
> >>>> later.
> >>>> [06/Sep/2012:11:59:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Not all
> >>>> replicas caught up, retrying in 10 seconds
> >>>> [06/Sep/2012:11:59:20 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Failed
> >>>> to contact agmt (agmt="cn=meTodc.ad.test" (dc:389)) error (10), will 
> >>>> retry
> >>>> later.
> >>>> [06/Sep/2012:11:59:20 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Not all
> >>>> replicas caught up, retrying in 20 seconds
> >>>> [06/Sep/2012:11:59:40 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Failed
> >>>> to contact agmt (agmt="cn=meTodc.ad.test" (dc:389)) error (10), will 
> >>>> retry
> >>>> later.
> >>>> [06/Sep/2012:11:59:40 -0400] NSMMReplicationPlugin - CleanAllRUV Task: 
> >>>> Not all
> >>>> replicas caught up, retrying in 40 seconds
> >>>>
> >>>> I don't think this is OK. Adding Rich to CC to follow on this one.
> >>>>
> >>>> Martin
> >>>>
> >>> And now the actual CC.
> >>> Martin
> >>>
> >> Yeah - looks like CLEANALLRUV needs to ignore windows sync agreements
> > Have you created a 389 ticket for that or would you like me to do that?
> If you could, that would be great.


https://fedorahosted.org/389/ticket/450

Martin

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] [PATCH] 1031 run cleanallruv task

Reply via email to