Hi again,

> Sorry I don't have a simple answer for you,

well, that is certainly not your fault :D


> My point is that it is somewhat reasonable that Icinga does not handle this failure scenario
> since it can be avoided with an HA topology.

I understand. In my eyes, this is still wrong because Icinga2 then assumes that one is always able to build an HA topology. But there are still many scenarios where Icinga2 will not be able to connect to the database *although* one has a complete HA setup with virtual/floating IP and Pacemaker.

Just to name a few examples:
  - IPTABLES issues
  - Network issues (e.g. when DB is provided
    within another network than Icinga2 communication)
  - DNS issues (in many cases it makes sense to use
    a FQDN instead of an IP)
(There might be more reasons, but I guess one gets the point.)

So relying on the fact that the database will always be there is a bit careless in my eyes. In addition, I don't see why one would not see a failover in these situations :-)

Pacemaker could probably accomplish this, and there are other ways
(e.g. Monit)

Yes, I see it the same way. I guess at the end there will be Monit shutting 
down Icinga2 as soon as the local Galera node is absent. Good suggestion, btw.


but another thought is:
- If the local Galera crashes, consider that node failed entirely and shutdown
the local Icinga (potentially causing the DB writer to failover as well).

Yes, this is a good workaround indeed.


Anyway, I opened up a GitHub issue and am curious what the devs are gonna say...

Best regards
Valentin


On 17.05.2017 18:02, Lee Clemens wrote:
Hi Valentin,

On 05/17/2017 10:19 AM, Valentin Höbel wrote:
Hi Lee,

thanks for taking the time for a) testing this stuff and b) writing an answer. 
I appreciate the efforts!

You're welcome! Sorry I don't have a simple answer for you, and I'm certainly 
not telling you how to set things up - just trying to provide some 
info/suggestions.

This failure scenario would require a serious network issue if your
database cluster was architected to be highly available (any client
node can connect to any server node). That's not to say perhaps
it could not be handled better.
That is not the point. It doesn't matter how "bad", "good" or "unusual" the 
Galera HA setup is. The point is that Icinga2 doesn't failover although it can't access the database. The 
Galera setup in this case is very special (and yes, I do have reasons to build it that way and no, sorry, 
since this is a customer project I can't tell you why, sorry, really).

My point is that it is somewhat reasonable that Icinga does not handle this 
failure scenario since it can be avoided with an HA topology.
Perhaps it could be handled better, of course (as I said above).

I presume the design assumption was that if one node cannot connect to the DB, 
the other nodes probably cannot connect either (the serious network issue 
scenario),
so just continue to queue queries up until service is restored.

DB connectivity awareness could be built in to Icinga, for example by passing a token 
between the nodes with "I can (not) connect to the DB" and electing a DB writer 
node.


Pacemaker could probably accomplish this, and there are other ways (e.g. 
Monit), but another thought is:
- If the local Galera crashes, consider that node failed entirely and shutdown 
the local Icinga (potentially causing the DB writer to failover as well).

You'll have to weigh the tradeoffs according to your own situation and 
requirements.

Kind Regards,
Lee Clemens
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

--
Valentin Höbel
Senior Consultant IT Infrastructure
mobil 0711-95337077

open*i GmbH
Talstraße 41 70188 Stuttgart Germany

Geschäftsführer Tilo Mey
Amtsgericht Stuttgart,  HRB 729287, Ust-IdNr DE264295269
Volksbank Stuttgart EG, BIC VOBADESS, IBAN DE75600901000340001003

_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to