> On 3 Feb 2020, at 23:43, Jay Fenlason <[email protected]> wrote:
> 
> On Mon, Feb 03, 2020 at 10:38:59AM +1000, William Brown wrote:
>> 
>> 
>>> On 1 Feb 2020, at 12:10, Jay Fenlason <[email protected]> wrote:
>>> 
>>> I have a small FreeIPA deployment of ~6-8 servers running on Centos
>>> 7.7.  Do to the addition and removal of some of the servers, some
>>> cruft (tombstones, replication conflicts, etc) have crept in to the
>>> directory.  I noticed that when I attempted to delete some of the
>>> cruft entries, ns-slapd would hang, failirg to process requests, or
>>> even shut down.
> 
>> Can you tell us exactly what entries you noticed and how you attempted to 
>> delete them? There are certainly some things like tombstones and such that 
>> you shouldn't be touching as they are part of the internal replication state 
>> machine.
> 
> No, I don't remember what entries they were.  I was following
> instructions from:
> https://docs.fedoraproject.org/en-US/Fedora/18/html/FreeIPA_Guide/ipa-replica-manage.html
> (or maybe elsewhere) using ldapdelete to remove tombstones for a truly
> deleted server.

I think this advice is quite out dated now, we have potentially got better 
tools to handle this, but that's a really difficult issue to manage content 
ownership, management, and getting google to show the latest content etc .... 

> 
>> Knowing what you did will also help us to create a test case and
>> reproducers to validate your patch also.
> 
> I found the bug by doing a series of "ipa-client-install" (with lots
> of arguments, followed by
> echo ca_host = {a not-firewalled IPA CA} >> /etc/ipa/default.conf
> echo [global] > /etc/ipa/installer.conf
> echo ca_host = {ditto} >> /etc/ipa/installer.conf
> echo {password} | kinit admin
> ipa hostgroup-add-member ipaservers --hosts $(hostname -f)
> ipa-relica-install --setup-ca --setup-dns --forwarder={ip addr}
> 
> followed by the replica install failing due to network issues,
> misconfigured firewalls, etc, then
> ipa-server-install --uninstall on the host
> and ipa-replica-manage del {failed install host}
> elsewhere in the mesh, sometimes with ldapdelete of the initial
> replication agreement that ipa-replica-manage did not remove.
> 
> Rinse, repeat. . .
> 
> Until ipa-replica-install starts failing because the source LDAP
> server hangs (because of this bug) during the "starting initial
> replication" step.  It was while debugging that that I discovered that
> ldapdelete on the tombstone entries also caused the LDAP servers to
> lock up.
> 
> 
>> Thanks for the report :) 
> 
> Incidentally, there's another bug, which I have not investigated,
> where attempting to ldapdelete a problematic tombstone entry
> immediately after restarting the LDAP server returns an error, and
> nothing is deleted on the server.  If you do an ldapsearch, and then
> an ldapdelete, the entry is removed, but then slapd hangs (this bug
> again) and does not respond to searches or deletes (or shutdown
> requests) until you kill -9 it.  I don't know how it relates to this
> bug.

So I think that deleting the tombstones is not the correct (or valid) course of 
action here. Tombstones are a really important part of the replication 
lifecycle, so if anything we need to taker stronger steps to prevent a client 
from being able to delete them at all. This makes me question the patch you 
have provided, because you shouldn't be in a position to delete tombstones in 
the first place, only the server internally should be purging (deleting) these 
when replication is known to be in a consistent state. I am happy to explain 
the functionality of tombstones further if you are interested. 

Deleting a conflict entry however, is just fine, so that shouldn't have caused 
the issue. 

I wonder if a contributing factor here is if ipa-replica-install is re-using 
replica ids, which could cause replication to have a problem.

Perhaps the solution here is to have ipa-replica-install to attempt a 
cleanallruv on any replica id it's *about* to try to use, in case it has been 
re-used. 

My thinking at this point is that there is something else going on, and the 
issue may reside in a series of interactions in the ipa replica steps you have 
taken. Have you contacted the freeipa-users group about this at all? 

> 
>    -- JF
> _______________________________________________
> 389-devel mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/[email protected]

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to