Hi again,
> Sorry I don't have a simple answer for you,
well, that is certainly not your fault :D
> My point is that it is somewhat reasonable that Icinga does not
handle this failure scenario
> since it can be avoided with an HA topology.
I understand. In my eyes, this is still wrong because Icinga2 then
assumes that one is always able to build an HA topology. But there are
still many scenarios where Icinga2 will not be able to connect to the
database *although* one has a complete HA setup with virtual/floating IP
and Pacemaker.
Just to name a few examples:
- IPTABLES issues
- Network issues (e.g. when DB is provided
within another network than Icinga2 communication)
- DNS issues (in many cases it makes sense to use
a FQDN instead of an IP)
(There might be more reasons, but I guess one gets the point.)
So relying on the fact that the database will always be there is a bit
careless in my eyes.
In addition, I don't see why one would not see a failover in these
situations :-)
Pacemaker could probably accomplish this, and there are other ways
(e.g. Monit)
Yes, I see it the same way. I guess at the end there will be Monit shutting
down Icinga2 as soon as the local Galera node is absent. Good suggestion, btw.
but another thought is:
- If the local Galera crashes, consider that node failed entirely and shutdown
the local Icinga (potentially causing the DB writer to failover as well).
Yes, this is a good workaround indeed.
Anyway, I opened up a GitHub issue and am curious what the devs are gonna say...
Best regards
Valentin
On 17.05.2017 18:02, Lee Clemens wrote:
Hi Valentin,
On 05/17/2017 10:19 AM, Valentin Höbel wrote:
Hi Lee,
thanks for taking the time for a) testing this stuff and b) writing an answer.
I appreciate the efforts!
You're welcome! Sorry I don't have a simple answer for you, and I'm certainly
not telling you how to set things up - just trying to provide some
info/suggestions.
This failure scenario would require a serious network issue if your
database cluster was architected to be highly available (any client
node can connect to any server node). That's not to say perhaps
it could not be handled better.
That is not the point. It doesn't matter how "bad", "good" or "unusual" the
Galera HA setup is. The point is that Icinga2 doesn't failover although it can't access the database. The
Galera setup in this case is very special (and yes, I do have reasons to build it that way and no, sorry,
since this is a customer project I can't tell you why, sorry, really).
My point is that it is somewhat reasonable that Icinga does not handle this
failure scenario since it can be avoided with an HA topology.
Perhaps it could be handled better, of course (as I said above).
I presume the design assumption was that if one node cannot connect to the DB,
the other nodes probably cannot connect either (the serious network issue
scenario),
so just continue to queue queries up until service is restored.
DB connectivity awareness could be built in to Icinga, for example by passing a token
between the nodes with "I can (not) connect to the DB" and electing a DB writer
node.
Pacemaker could probably accomplish this, and there are other ways (e.g.
Monit), but another thought is:
- If the local Galera crashes, consider that node failed entirely and shutdown
the local Icinga (potentially causing the DB writer to failover as well).
You'll have to weigh the tradeoffs according to your own situation and
requirements.
Kind Regards,
Lee Clemens
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users
--
Valentin Höbel
Senior Consultant IT Infrastructure
mobil 0711-95337077
open*i GmbH
Talstraße 41 70188 Stuttgart Germany
Geschäftsführer Tilo Mey
Amtsgericht Stuttgart, HRB 729287, Ust-IdNr DE264295269
Volksbank Stuttgart EG, BIC VOBADESS, IBAN DE75600901000340001003
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users