On 03/04/15 19:38, Andrew Bogott wrote:
<snip>
Additionally, I would appreciate it if a few projects would volunteer to
be early adopters. If you're interested in trying it out, please
respond to this email so that I know who's trying, and then go to your
'configure instance' pages and clear the 'use_dnsmasq' setting. If your
instance is using role::puppet::self, you'll also need to sign a new
puppet cert, like this:
$ sudo puppet cert sign <hostname>.<projectname>.eqiad.wmflabs
In addition to being more reliable, the new DNS system will also support
names that include the project name, like
'util-abogott.testlabs.eqiad.wmflabs'. The old naming scheme is still
supported, but many services will be gradually moving over to the new
scheme to avoid ambiguity between projects.
After a few weeks of testing I'll start to migrate everything to the new
server if things look good. Let me know how things go.
Thanks!
-Andrew
[1] The new system uses openstack-designate to create dns entries which
are subsequently served by a powerdns server running on
labs-ns2.wikimedia.org
Hello,
The 'integration' labs project has been switched to that new DNS by
mistake which caused a partial outage on CI.
The use_dnsmasq (which is set to true on instances) has been renamed to
'use_dnsmasq_server' when support for hiera has been added with:
https://gerrit.wikimedia.org/r/#/c/202278/
That immediately caused puppet client on the integration run to switch
to the new DNS resolver which caused two major issues:
A) all puppet client suddenly refused connection due to the certname
being based on the hostname instead of the ec2id
B) Jenkins jobs hitting the beta cluster all failed because the
resolution of *.beta.wmflabs.org DNS entries yields the public instance
IP which is not reacheable.
I have filled https://phabricator.wikimedia.org/T95273 which contains
the work I did to revert back to the previous state. Namely:
* Have hiera set both use_dnsmasq and use_dnsmasq_server
https://wikitech.wikimedia.org/w/index.php?title=Hiera:Integration&diff=152484&oldid=152033
* Manually reinstall / fix configuration on the puppetmaster since files
have gone wild.
* Fight with cert regeneration on puppetmaster
* Manually fix /etc/resolv.conf on all instances and restart ncsd to
clear the DNS cache
* Delete and invalidate certs of all clients and resign them.
The beta cluster which uses a puppetmaster has not been affected
luckily. No idea why though.
I have filled https://phabricator.wikimedia.org/T95288 to have Designate
yield different answer based on the client (a feature known as split
horizon).
--
Antoine Musso
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l