On 12/05/2014 04:50 PM, Simo Sorce wrote:
On Thu, 04 Dec 2014 14:33:09 +0100
Ludwig Krispenz <lkris...@redhat.com> wrote:

hi,

I just have another (hopefully this will end soon) issue I want to
get your input. (please read to teh end first)

To recapture the conditions:
-  the topology plugin manages the connections between servers as
segments in the shared tree
- it is authoritative for managed servers, eg it controls all
connections between servers listed under cn=masters,
    it is permissive for connection to other servers
- it rejects any removal of a segment, which would disconnect the
topology.
- a change in topology can be applied to any server in the topology,
it will reach the respective servers and the plugin will act upon it

Now there is a special case, causing a bit of trouble. If a replica
is to be removed from the topology, this means that
the replication agreements from and to this replica should be
removed, the server should be removed from the manages servers.
The problem is that:
- if you remove the server first, the server becomes unmanaged and
removal of the segment will not trigger a removal of the replication
agreement
Can you explain what you mean "if you remove the server first" exactly ?
What LDAP operation will be performed, by the management tools ?
as far as the plugin is concerned a removal of a replica triggers two operations: - removal of the host from the sservers in cn=masters, so the server is no longer considered as managed - removal of the segment(s) connecting the to be removed replica to other still amnaged servers, which should remove the corresponding replication agreements.
It was the order of these two operations I was talking

- if you remove the segments first, one segment will be the last one
connecting this replica to the topology and removal will be rejected
We should never remove the segments first indeed.
if we can fully control that only specific management tools can be used,
we can define the order, but an admin could apply individual operations
and still it would be good if nothing breaks

Now, with some effort this can be resolved, eg
if the server is removed, keep it internally as removed server and
for segments connecting this server trigger removal of replication
agreements or mark a the last segment, when tried to remove, as
pending and once the server is removed also remove the corresponding
repl agreements
Why should we "keep it internally" ?
If you mark the agreements as managed by setting an attribute on them,
then you will never have any issue recognizing a "managed" agreement in
cn=config, and you will also immediately find out it is "old" as it is
not backed by a segment so you will safely remove it.
I didn't want to add new flags/fields to the replication agreements
as long as anything can be handled by the data in the shared tree.
"internally" was probably misleading, but I will think about it again

Segments (and their agreements) should be removed as trigger on the
master entry getting removed. This should be done even if it causes a
split brain, because if the server is removed, no matter how much we
wish to keep tropology integrity we effectively are in a split brain
situation, keeping toplogy agreements alive w/o the server entry
doesn't help.
If we can agree on that, that presence/removal of masters is the primary trigger that's fine. I was thinking of situations where a server was removed, but not uninstalled.
Just taking it out of the topology, but it could still be reached

But there is a problem, which I think is much harder and I am not
sure how much effort I should put in resolving it.
If we want to have the replication agreements cleaned up after
removal of a replica without direct modification of cn=config, we
need to follow the path above,
but this also means that the last change needs to reach both the
removed replica (R) and the last server(S) it is connected to.
It would be nice if the changed reached the replica, indeed, but not a
big deal if it doesn't, if you are removing the replica it means you
are decommissioning it, so it is not really that important that it
receives updates, it will be destroyed shortly.
That's what I was not sure about, couldn't there be cases where it is not destroyed,
just isolated.
And if it is not destroyed for whatever reason, it will be removed from
the masters group anyway so it will have no permission to replicate
back, and no harm is done to the overall domain.

The bad thing is that if this change triggers a
removal of the replication agreement on S it could be that the change
is not replicated to R before the agreement is removed and is lost.
There is no way (or no easy) way to know for teh plugin if a change
was received by an other server,
There is an easy way, contact the other server and see if the change
happened in its LDAP tree :)
BNut this is not really necessary, as explained above.

I was also thinking about some kind
of acknowledge mechanism by doing a ping pong of changes, but the
problem always is the same that one server does not know if the other
has received it.
And even if this would theoretically work, we cannot be sure that R
is not shutdown and only the remaining topology is tried to be
cleaned up, so S would wait forever.
We should not care, if you are deleting a replica it doesn't matter
what's on the replica side IMO.

My suggestion to resolve this (in most cases) is to define a wait
interval, after the final combination of removal of a server and its
connecting segment is received, wait for some time and then remove
the corresponding replication agreements.
Why ?

So I'm asking you if this would be acceptable or if you have a better
solution.
I am trying to understand why we have a problem, actually, I do not
really see one, why do you think it is important to update a replica
that is being killed ?
because I had scenarios in mind where it would not be killed, just removed from the topology

Simo.


_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to