Re: cluster fragmentation

Stephan Hesse Sat, 14 May 2011 09:19:02 -0700

Well, it's a complex scenario, so constructing a test case is also complex.
That is why I bothered you with my questions and did not simply test it
out.


What I mean with "cluster fragmentation" is a temporary (partial) loss of
network connectivity between communicating systems without any of the
systems themselves failing.

Let me try to explain it with an example:

Assuming, we have a moderately complex setup of 4 communicating systems:
DB-server-A, DB-server-B, Client-A and Client-B with DB-server-A and
Client-A
attached to a switch (named Switch-A) and DB-server-B and Client-B
attached to another switch (named Switch-B). Let's assume that both switches
are connected.


                 +----------+
DB-Server-A ---- | Switch A |--- Client-A
                 +----------+
                       |
                 +----------+
DB-Server-B ---- | Switch B |--- Client-B
                 +----------+



During normal operation, both clients will connect to both DB servers.

Now, lets' assume that the link between Swicht A and Switch B fails for
some time.

                 +----------+
DB-Server-A ---- | Switch A |--- Client-A
                 +----------+

                 +----------+
DB-Server-B ---- | Switch B |--- Client-B
                 +----------+

As a result, Client-A will eventually be disconnected from DB-Server-B
and vice versa. Both clients will think they have lost redundancy but
will happily continue working with their respective DB server.

After the link between the switches is re-established, none of the
components will notice any change ... so there is no instance that
will notice the need to run the CreateCluster tool.


Of course, for this special case, I could add some kind of
heartbeating between DB-Server-A and DB-Server-B to notice
the link failure. However, there are even more complex cases
that cannot be caught by such a heartbeating.



.... And yes, my experience shows, that in real life systems
everything that can go wrong will go grong ... after a surprisingly
short period of time. So I would not consider these thoughts as
purley academic and constructed.



Thanks,
Stephan






Am 13.05.2011 22:37, schrieb Thomas Mueller:
> Hi,
> 
>     I was thinking about building a cluster control program that would
>     automate
>     the cluster rebuild without any human intervention.
> 
> 
> That would be great of course!
> 
>     Yes, I know, in many cases you would not want such an automatism because
>     there is so much that can go wrong...
> 
> 
> Well, if 99.999% of all risks can be eliminated, then automating this
> would be great :-)
> 
>     However, I need to deal with customers that don't want to control
>     their database manually (in fact they don't want to care about these
>     'details').
> 
> 
> That's understandable.
>  
> 
>     In the case I have described, that system could end up in a situation
>     where one client (that was connected when cluster frgamentation occured)
>     works on only one database while another client (that did connect
>     when network connectivity was up again) works on both of them ...
>     with nobody even noticing that they are running into more
>     and more inconsistent databases.
> 
> 
> I think there is a mechanism that ensures this can't happen. If this
> mechanism doesn't work, then it's a bug.
> 
> But first let's define what you mean with "cluster fragmentation",
> because this is a term I never heard. Do you mean one of the cluster
> nodes (instances) was killed?
> 
>     Well ... I was hoping you would answer that there already is a
>     mechanism in place that would help the clients to safely detect
>     the inconsistent situation and force them to reconnect.
> 
> 
> Yes, there is such a mechanism in the "CreateCluster" tool: it sets the
> exclusive mode and kills other connections ("SET EXCLUSIVE 2"). The
> other connections need to use the auto-reconnect feature. This is
> documented.
> 
> If it doesn't work for you please tell me - even better please post a
> simple test case.
> 
> Regards,
> Thomas
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "H2 Database" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: cluster fragmentation

Reply via email to