-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> The use of "host" to check the FQDN is just faulty, > Why?
Because there are many implementations of "host", wich report differently. Then we would have to detect what implementation is running, and handle it differently, at least if we want some level of error detection in the RA. I can feel the taste of ugly in my mouth. :-) >> and I actually now >> prefer the use of "parent_domain" wich I submitted earlier. > That means that for this to work the cluster must never consist of two > nodes named alice.foo and bob.bar, and I've not seen that requirement > documented for Pacemaker clusters before. Plus, this whole thing _only_ > applies on systems where "uname -n" does not include the domain name. > Which isn't the case for all platforms, RHEL/CentOS/Fedora do include > the domain name. If RHEL/CentOS/Fedora does include the full FQDN by default for "uname - -n" or "hostname" then I believe the configuration to be faulty. It is my understanding that a run of "hostname" uses gethostname(2) and should just return the hostname itself. A run of "hostname --fqdn" should use gethostbyname(2) wich in turn uses the resolver of the system to get a <hostname>.<domain> for the system. "hostname -d" just returns the <domain> part. I believe that "hostname" should just return the name of the host. However, I see that noone has ever been able to agree on this; The man page for uname on SCO (!) has the most complete description I've found: - --- - -n Print the hostname (formerly also known as the ``node name''); that is, the first part (up to the first "." character) of the hostname parameter. - --- RHEL defines the hostname in /etc/sysconfig/network but seems to put a FQDN in there during installation. If it is set to just a hostname, the "hostname", "hostname --fqdn" and "hostname -d" commands work as expected. Or at least how I expect them to work. :-) Debian/Ubuntu uses the file /etc/hostname; By default it only writes a hostname, not the FQDN. "hostname"/"uname -n" works as I expect. Maybe someone else could shine some light on this? > Should we just back out the patch and document that "if you want TLS, > either configure your system such that hostname does contain the domain > name, or create SSL certs for just the unqualified domain name"? It probably should be backed out till a better implementation is found; Alternatively the use of a static domain string or dnsdomainname could be included as a kludge/workaround for people having a wrong/correct (!) hostname configuration. :-) >> An >> alternative is to use the "dnsdomainname" function to look up the local >> dns domainname, wich should work if the resolver works. I'm making a new >> patch using this. > Which also operates on the assumption that all nodes are in the same > domain. I don't see how this is better than specifying parent_domain > explicitly. Yes, it does assume this. People should set up their clusters with a stringent naming convention anyway... I kid, I kid. >> So, what I really would want to do is to wait X seconds (30?) for the >> libvirtd socket to be available, and if that doesn't work, try to check >> manually if the VM is running.. I.e. by checking if the qemu socket >> exists in /var/lib/libvirt/qemu/ or if there exists a .pid and/or .xml >> file in /var/run/libvirt/qemu/. If it doesn't, report the resource as >> Stopped. >> This is ofcourse KVM specific, and probably Ubuntu specific, and >> extremly ugly. What do people propose? > I'm beginning to believe we should really parse the domain name directly > from the config file, with some XPath statement. Ugly as that may be. It is probably less ugly than how it is done right now. Okay, lets says I parse the configuration directly and now have the domain name in a variable. How do I know the state of the VM? The "stop" or "status" functions of the RA uses libvirt as well; But KVM is not dependent on libvirt to run, and libvirt has a tendency to crash if you poke it hard. If I assume that a VM is down if libvirt isn't running, things will explode if libvirt is restarting. Best solution I've found yet (without trying to detect if a VM is running by snooping around for sockets or PIDs) is to retry a connect to libvirt for 30 (?) seconds and if that doesn't work just report a node as shutdown, not as faulty.. But ofcourse, this is extremly dangerous. If libvirt becomes broken on a node in the cluster, with VMs running, the cluster will probably try to start up the VM on another node. Instant pain. Oh, yeah, that's why we use fencing. Nevermind. :-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvqUaQACgkQDmg6+jYrZoDNhwCfXhWcTqADnmveuzb1zrkppdBe GEEAoImnICFyjgvSNIOiFoDfqvFKsSOr =lDTX -----END PGP SIGNATURE----- _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
