On Thu, 2015-06-25 at 09:53 +0200, Petr Vobornik wrote:
> On 06/25/2015 08:52 AM, Ludwig Krispenz wrote:
> >
> > On 06/24/2015 09:01 PM, Simo Sorce wrote:
> >> On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote:
> >>> Oleg,
> >>>
> >>> the topology plugin relies on existing connection between servers which
> >>> remain in a topolgy. If you remove a central node in your topology you
> >>> are asking for trouble.
> >>> With Petr's patch it warns you that your topology will be disconnected,
> >>> and if you insist we cannot guarantee anything.
> >>> should we completely prohibit this ?
> >> No, but a --force should be needed.
> >> Without a --force option we should not allow to remove a replica
> >> completely from another one.
> >>
> >>> I don't know, I think you could
> >>> also enforce an uninstall of vm175 with probably the same result.
> >>> what you mean be calculating the remaining topology and send it to the
> >>> remaining servers does not work, it would require to send a removal of a
> >>> segment, which would be rejected.
> >> You would have to connect to each replica that has a replication
> >> agreement with vm175 and remove the segment from that replica. But it
> >> wouldn't really help much as once a replica is isolated from the central
> >> one, it will not see the other operations going on in other replicas.
> >>
> >> Once we have a topology resolver we will be able to warn that removing a
> >> specific replica will cause a split brain and make very loud warnings
> > we have this already, see the output of Oleg's example:
> >
> > ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com
> > Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be
> > disconnected:
> > Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers:
> > vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
> > Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers:
> > vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com,
> > vm-127.idm.lab.eng.brq.redhat.com
> > Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers:
> > vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com,
> > vm-036.idm.lab.eng.brq.redhat.com
> > Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers:
> > vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
> > Continue to delete? [no]: yes
> >
> > it tells you that the topology gets disconnected and which connections
> > will be missing, the continue yes/no is the --force,
> > the question was, should we allow a force in this situation ?
> >
> What it does is:
> 1. Checks current topology, prints errors with introduction msg:
>     "Current topology is disconnected:" + errors
> 2. Checks topology after node removal, prints errors with msg:
>     "Topology after removal of %s will be disconnected:" + errors
> 3. if there were errors in #1 or #2, it does:
>     if not force and not ipautil.user_input("Continue to delete?", False):
>        sys.exit("Aborted")
> To make it more loud we can introduce msg in #2 with: "WARNING: " or 
> something even more louder
> The question "Continue to delete?" could be
> * removed, and therefore --force will be always required for such case
> * be still regarded as 'force' but the question could be changed e.g. 
> to: "Continue to delete and disconnect the topology?"

I do not like questions very much, they are usually annoying to
scripting and such. I would not ask questions, and simply deny the
operation if --force is not present, and allow it if it is present.

> >>> More interesting would be if we can heal this later by adding new
> >>> segments.
> >> Indeed, reconnecting all the severed replicas should cause all the
> >> removals (segments or servers) to be replicated among servers and should
> >> bring back the topology view in a consistent state. But not until all
> >> servers are reconnected and replication has started again.
> > This healing can also be required without forcing removal by an admin.
> > If you have a start topology and your central node goes down and is not
> > recoverable

Yes, I think the most likely case (bar testing) for ever using --force
remove is that a server imploded and died, and just need replacing.
Being able to recover from such a situation by simply reconnecting
replicas until the split brain is healed is paramount.

I would go as far as saying that perhaps we should provide a simple
"heal-topology" command in a *future* version that will pick one replica
and reconnect all the missing branches in a stellar topology.

The only problem in doing that is that the tool my have a misleading
idea of the status of the topology given that when replication is
severed not all topology changes may be reflected to all servers. So
different servers may have a different view of the current topology
based on when they got disconnected and the replication flow was
interrupted. So a good tool would have to reconnect all branches it
sees, then wait a little to see if the reconnected replicas send in
topology changes and re-iterate if further changes caused the topology
to still be in split brain.

Another tool could be built that allows the admin to indicate a master
needs to be removed and the tool would tell what replication agreements
should be created before removal to avoid split brain. But this is not
really useful if the master is already dead and replication is
effectively stopped.

For now the admin will need to do this manually, but we need to test the
situation is recoverable.


> >>
> >> Simo.
> >>
> >>
> >>> Ludwig
> >>> On 06/24/2015 11:04 AM, Oleg Fayans wrote:
> >>>> Hi everybody,
> >>>>
> >>>> Current implementation of topology plugin (including patch 878 from
> >>>> Petr) allows the deletion of the central node in the star topology.
> >>>> I had the following topology:
> >>>>
> <snip>

Simo Sorce * Red Hat, Inc * New York

Manage your subscription for the Freeipa-devel mailing list:
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code

Reply via email to