Thanks Ravi. it seems there are two problems. 1. the host deploy failure 2. random failures caused by host monitoring exception.
Please note that master is broken for ovirt-engine and we need to stop merging until these issues are fixed. On Mon, Sep 10, 2018 at 8:48 PM, Ravi Shankar Nori <[email protected]> wrote: > VdsDeploy is failing with message > > Failed to execute stage 'Initializing': Invalid response opcode 'CONFIRM' > > The SQL exceptions occur when trying to create an auditlog event with the > above message which is a separate issue. > > On Mon, Sep 10, 2018 at 7:41 AM, Dafna Ron <[email protected]> wrote: > >> I am seeing a failure in 4.2 which suggests the same issue is causing >> failure in 4.2 as well (on different test) >> >> full logs: >> >> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ >> 3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4 >> .2/post-006_migrations.py/ >> >> error: >> >> 2018-09-10 05:27:19,084-0400 INFO (jsonrpc/4) [api.virt] FINISH migrate >> return={'status': {'message': 'Migration in progress', 'code': 0}, >> 'progress': 0} from=::ffff:192.168.200.2,36202, flow_id=107cdbdd, >> vmId=9d5e4d63-41ad-47eb-bf04-0fc >> 8a49d59a6 (api:52) >> 2018-09-10 05:27:19,085-0400 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] >> RPC call VM.migrate succeeded in 0.01 seconds (__init__:573) >> 2018-09-10 05:27:21,583-0400 INFO (migsrc/9d5e4d63) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Creation of destination VM >> took: 2 seconds (migration:473) >> 2018-09-10 05:27:21,583-0400 INFO (migsrc/9d5e4d63) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') starting migration to >> qemu+tls://lago-basic-suite-4-2-host-1/system with miguri >> tcp://lago-basic-suite-4-2-host-1 (migration:502) >> 2018-09-10 05:27:25,044-0400 ERROR (periodic/0) [virt.periodic.Operation] >> <vdsm.virt.sampling.HostMonitor object at 0x7f0ddfc2b190> operation >> failed (periodic:222) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line >> 220, in __call__ >> self._func() >> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >> 576, in __call__ >> sample = HostSample(self._pid) >> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >> 240, in __init__ >> self.interfaces = _get_interfaces_and_samples() >> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >> 202, in _get_interfaces_and_samples >> links_and_samples[link.name] = InterfaceSample(link) >> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >> 109, in __init__ >> self.speed = _getLinkSpeed(link) >> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line >> 590, in _getLinkSpeed >> speed = vlan.speed(dev.name) >> File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", >> line 35, in speed >> dev_speed = nic.read_speed_using_sysfs(dev_name) >> File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line >> 47, in read_speed_using_sysfs >> s = int(f.read()) >> IOError: [Errno 22] Invalid argument >> 2018-09-10 05:27:25,499-0400 INFO (libvirt/events) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') CPU stopped: onSuspend >> (vm:6203) >> 2018-09-10 05:27:27,167-0400 INFO (migsrc/9d5e4d63) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') migration took 7 seconds >> to complete (migration:514) >> 2018-09-10 05:27:27,168-0400 INFO (migsrc/9d5e4d63) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Changed state to Down: >> Migration succeeded (code=4) (vm:1693) >> 2018-09-10 05:27:27,176-0400 INFO (migsrc/9d5e4d63) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection >> (guestagent:438) >> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [api.virt] START >> destroy(gracefulAttempts=1) from=::ffff:192.168.200.2,36202, >> vmId=9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6 (api:46) >> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Release VM resources >> (vm:5283) >> 2018-09-10 05:27:27,183-0400 WARN (jsonrpc/6) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') trying to set state to >> Powering down when already Down (vm:612) >> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection >> (guestagent:438) >> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') _destroyVmGraceful attempt >> #0 (vm:5320) >> 2018-09-10 05:27:27,184-0400 WARN (jsonrpc/6) [virt.vm] >> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') VM >> '9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6' couldn't be destroyed in libvirt: >> Requested operation is not valid: domain is not running (vm >> :5341) >> 2018-09-10 05:27:27,185-0400 INFO (jsonrpc/6) [vdsm.api] START >> teardownImage(sdUUID='8cbc7539-1ab3-4a3e-9504-65b5002c6546', >> spUUID='3b4effaf-6bad-42b8-810b-80faaa5d5a1e', >> imgUUID='90402f78-6c84-44cf-bcdc-7823f252d185', volUUID=None) from >> =::ffff:192.168.200.2,36202, task_id=73c097e0-6cb6-449e-bebe-f0dfa55e2155 >> (api:46) >> 2018-09-10 05:27:27,186-0400 INFO (jsonrpc/6) [storage.StorageDomain] >> Removing image run directory '/var/run/vdsm/storage/8cbc753 >> 9-1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185' >> (blockSD:1360) >> 2018-09-10 05:27:27,186-0400 INFO (jsonrpc/6) [storage.fileUtils] >> Removing directory: /var/run/vdsm/storage/8cbc7539 >> -1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185 >> (fileUtils:178) >> 2018-09-10 05:27:27,325-0400 INFO (jsonrpc/6) [storage.LVM] Deactivating >> lvs: vg=8cbc7539-1ab3-4a3e-9504-65b5002c6546 >> lvs=['715b9383-87ec-4381-aaff-f084df42b3ce', >> 'ba23b2ac-90db-41b7-ab0f-c49f0c4d1b1e', '9201288a-918b-41a8-8ff2-66438 >> 33e1 >> d91'] (lvm:1310) >> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ >> 3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4 >> .2/post-006_migrations.py/ >> >> On Mon, Sep 10, 2018 at 11:03 AM, Dafna Ron <[email protected]> wrote: >> >>> Hi, >>> >>> we had a failure to deploy hosts for change; >>> https://gerrit.ovirt.org/#/c/94228/1 >>> >>> I don't think its related directly to the change but anything between >>> packaging: use system provided nimbus-jose-jwt also on el7 patch and >>> current patch could have caused the failure. >>> >>> https://gerrit.ovirt.org/#/q/project:ovirt-engine+status:Mer >>> ged+branch:master >>> >>> Sandro, Ondra, can you please have a look? >>> >>> (Relevant) error snippet from the log: >>> >>> <error> >>> >>> 018-09-10 04:05:33,850-04 INFO >>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Installation of >>> lago-upgrade-from-release-suite-master-host-0. Executing command via >>> SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t >>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm >>> -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C >>> "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>> DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt- >>> host-deploy.tar >>> 2018-09-10 04:05:33,850-04 INFO >>> [org.ovirt.engine.core.utils.archivers.tar.CachedTar] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Tarball >>> '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh >>> 2018-09-10 04:05:33,915-04 INFO >>> [org.ovirt.engine.core.uutils.ssh.SSHDialog] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] SSH execute >>> 'root@lago-upgrade-from-release-suite-master-host-0' 'umask 0077; >>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap >>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > >>> /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C "${MYTMP}" -x && >>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>> DIALOG/customization=bool:True' >>> 2018-09-10 04:05:34,694-04 INFO [org.ovirt.engine.core.dal.dbb >>> roker.auditloghandling.AuditLogDirector] (VdsDeploy) [33bb53b] >>> EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host >>> lago-upgrade-from-release-suite-master-host-0. Stage: Initializing. >>> 2018-09-10 04:05:34,710-04 INFO >>> [org.ovirt.engine.core.utils.transaction.TransactionSupport] >>> (VdsDeploy) [33bb53b] transaction rolled back >>> 2018-09-10 04:05:34,719-04 ERROR >>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>> (VdsDeploy) [33bb53b] Error during deploy dialog >>> 2018-09-10 04:05:34,723-04 ERROR >>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host >>> lago-upgrade-from-release-suite-master-host-0 install >>> 2018-09-10 04:05:34,736-04 ERROR [org.ovirt.engine.core.dal.dbb >>> roker.auditloghandling.AuditLogDirector] >>> (EE-ManagedThreadFactory-engine-Thread-1) >>> [33bb53b] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has >>> occurred during installation of Host >>> lago-upgrade-from-release-suite-master-host-0: >>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, >>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, >>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested >>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte >>> sequence for encoding "UTF8": 0x00. >>> 2018-09-10 04:05:34,737-04 ERROR >>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host >>> lago-upgrade-from-release-suite-master-host-0 install, preferring first >>> exception: CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, >>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, >>> ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; >>> nested exception is org.postgresql.util.PSQLException: ERROR: invalid >>> byte sequence for encoding "UTF8": 0x00 >>> 2018-09-10 04:05:34,737-04 ERROR [org.ovirt.engine.core.bll.hos >>> tdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) >>> [33bb53b] Host installation failed for host >>> 'febee66d-fa4b-4321-9c99-cb053ddd4817', >>> 'lago-upgrade-from-release-suite-master-host-0': >>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, >>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, >>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested >>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte >>> sequence for encoding "UTF8": 0x00 >>> 2018-09-10 04:05:34,760-04 INFO >>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] START, >>> SetVdsStatusVDSCommand(HostName = >>> lago-upgrade-from-release-suite-master-host-0, >>> SetVdsStatusVDSCommandParameters:{hostId='febee66d-fa4b-4321-9c99-cb053ddd4817', >>> status='InstallFailed', nonOperationalReason='NONE', >>> stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 793ad1ed >>> 2018-09-10 04:05:34,769-04 INFO >>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] >>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] FINISH, >>> SetVdsStatusVDSCommand, return: , log id: 793ad1ed >>> 2018-09-10 04:05:34,781-04 ERROR [org.ovirt.engine.core.dal.dbb >>> roker.auditloghandling.AuditLogDirector] >>> (EE-ManagedThreadFactory-engine-Thread-1) >>> [33bb53b] EVENT_ID: VDS_INSTALL_FAILED(505), Host >>> lago-upgrade-from-release-suite-master-host-0 installation failed. >>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, >>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, >>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested >>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte >>> sequence for encoding "UTF8": 0x00. >>> 2018-09-10 04:05:34,794-04 INFO [org.ovirt.engine.core.bll.hos >>> tdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) >>> [33bb53b] Lock freed to object 'EngineLock:{exclusiveLocks='[ >>> febee66d-fa4b-4321-9c99-cb053ddd4817=VDS]', sharedLocks=''}' >>> 2018-09-10 04:06:10,003-04 INFO >>> [org.ovirt.engine.core.bll.quota.QuotaManager] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] Quota Cache >>> updated. (20 msec) >>> (END) >>> >>> >>> </error> >>> >>> >> >> _______________________________________________ >> Infra mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: https://www.ovirt.org/communit >> y/about/community-guidelines/ >> List Archives: https://lists.ovirt.org/archiv >> es/list/[email protected]/message/TZGEPC56JYKKCOZ5XODOJMVLBR7MI6KG/ >> >> >
_______________________________________________ Infra mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/3WXKTZ5VPQ4Q4BEFQ7TFFOO3Q6JLBYNM/
