----- Original Message ----- > From: "David Caro" <[email protected]> > To: "Infra" <[email protected]> > Sent: Tuesday, February 3, 2015 10:50:14 PM > Subject: Re: Major outage > > > Good news! > > > I got a vm working in the recently-upgraded fc20 host ovirt-srv02. The issue > with the vms seems to be that the default value for the numa setting is not > behaving correctly with libvirt. The fc19 vms just show the input/output > error, > but the fc20 shows also the libvirt full string, and there you can see that > it > complains about numa: > > > libvirtError: internal error: Process exited prior to exec: libvirt: error : > internal error: NUMA memory tuning in 'preferred' mode only supports single > node > > > So what I've done is edit the vm, pin it to a node, set it as not migratable > (or whatever is the spelling) and changed the numa mode from preferred to > strict. Saved, and then edited the vm again, reverting the host pin and the > migration settings, but not changing the numa ones. That allowed me to boot > one > of the vms so far (just tested). > > Some ugly issues: > > The know multipathd message in the logs... it's quite annoying and fills up > the logs. > Vdsm messed up the network a couple of times, once it removed all the ifcfg > files, and the other it restored old values in the rules/route files > Vdsm failed on vdsm-restore-net-config:89 with a non-existing key exception > insted of just showing an error message and continuing execution > > I'll triage the above errors tomorrow and resend to the devels, for now just > sending to avoid forgetting about them. > > > Will continue booting the rest of production vms, do some simple sanity and > leave the rest for tomorrow. > > On the good side, we have now one fc20 host on each cluster, and 3.5 on all > the > production DC hosts! yay \o/
Great news! adding some NUMA experts to see if they have an advise on optimizing it on the DC. e. > > If anything comes up again I'll update in this thread, if not, tomorrow > morning > I'll update when all the environment is working 100% > > pd. Thanks Fabian and Max!! > > On 02/03, David Caro wrote: > > > > New update, > > > > Host srv01 is up and running, but 02 and 03 have issues, they can't start > > up > > any vms. > > > > The error is in libvirt: > > > > libvirtError: Child quit during startup handshake: Input/output error > > > > > > Looking around I saw a thread in the users lists that fixed it with_ > > > > /usr/lib/systemd/systemd-vdsmd reconfigure force > > > > That worked on srv01, but the others did not. So I'm trying to upgrade to > > fc20 > > one of them, the srv02, hoping the newer libvirt version will not have that > > issue. > > > > Those two hosts are the ones that are in the production data center, and it > > has the foreman vm, so none of the slaves is working properly until that is > > solved. > > > > > > Will update in ~one hour or when the problem is solved. > > > > Being so late, if I get the production vms running in one host, I'll leave > > the > > rest for tomorrow. > > > > > > D > > > > On 02/03, David Caro wrote: > > > > > > Ok, update: > > > > > > > > > Not all the servers have been restored, most of the slave vms are up, and > > > all > > > but one host are up. > > > > > > Engine - Ok > > > storage -Ok > > > storage01 - Ok > > > storage02 - Ok > > > srv01 - DOWN > > > srv02 - OUT OF THE POOL (will add when 01 is up) > > > srv03 - OK > > > srv04 - OK > > > srv05 - OK > > > srv06 - OK > > > srv07 - OK > > > srv08 - OK > > > > > > > > > If you need any specific vm I can try to get it up on one of the running > > > hosts, > > > but I'd wait until the last host is up to start all of them. > > > > > > > > > Will update again when finished or in one hour. > > > > > > > > > On 02/03, David Caro wrote: > > > > > > > > We are having a major outage on phoenix lab, don't expect any > > > > vms/slaves to be > > > > properly working yet. > > > > > > > > Will update when solved or in an hour with the status. > > > > > > > > > > > > -- > > > > David Caro > > > > > > > > Red Hat S.L. > > > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > > > > > Tel.: +420 532 294 605 > > > > Email: [email protected] > > > > Web: www.redhat.com > > > > RHT Global #: 82-62605 > > > > > > > > > > > > > _______________________________________________ > > > > Infra mailing list > > > > [email protected] > > > > http://lists.ovirt.org/mailman/listinfo/infra > > > > > > > > > -- > > > David Caro > > > > > > Red Hat S.L. > > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > > > Tel.: +420 532 294 605 > > > Email: [email protected] > > > Web: www.redhat.com > > > RHT Global #: 82-62605 > > > > > > > > > _______________________________________________ > > > Infra mailing list > > > [email protected] > > > http://lists.ovirt.org/mailman/listinfo/infra > > > > > > -- > > David Caro > > > > Red Hat S.L. > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > Tel.: +420 532 294 605 > > Email: [email protected] > > Web: www.redhat.com > > RHT Global #: 82-62605 > > > > > _______________________________________________ > > Infra mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/infra > > > -- > David Caro > > Red Hat S.L. > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > Tel.: +420 532 294 605 > Email: [email protected] > Web: www.redhat.com > RHT Global #: 82-62605 > > _______________________________________________ > Infra mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/infra > _______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
