Thanks Ravi.
it seems there are two problems.
1. the host deploy failure
2. random failures caused by host monitoring exception.

Please note that master is broken for ovirt-engine and we need to stop
merging until these issues are fixed.



On Mon, Sep 10, 2018 at 8:48 PM, Ravi Shankar Nori <[email protected]> wrote:

> VdsDeploy is failing with message
>
> Failed to execute stage 'Initializing': Invalid response opcode 'CONFIRM'
>
> The SQL exceptions occur when trying to create an auditlog event with the
> above message which is a separate issue.
>
> On Mon, Sep 10, 2018 at 7:41 AM, Dafna Ron <[email protected]> wrote:
>
>> I am seeing a failure in 4.2 which suggests the same issue is causing
>> failure in 4.2 as well (on different test)
>>
>> full logs:
>>
>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/
>> 3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4
>> .2/post-006_migrations.py/
>>
>> error:
>>
>> 2018-09-10 05:27:19,084-0400 INFO  (jsonrpc/4) [api.virt] FINISH migrate
>> return={'status': {'message': 'Migration in progress', 'code': 0},
>> 'progress': 0} from=::ffff:192.168.200.2,36202, flow_id=107cdbdd,
>> vmId=9d5e4d63-41ad-47eb-bf04-0fc
>> 8a49d59a6 (api:52)
>> 2018-09-10 05:27:19,085-0400 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer]
>> RPC call VM.migrate succeeded in 0.01 seconds (__init__:573)
>> 2018-09-10 05:27:21,583-0400 INFO  (migsrc/9d5e4d63) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Creation of destination VM
>> took: 2 seconds (migration:473)
>> 2018-09-10 05:27:21,583-0400 INFO  (migsrc/9d5e4d63) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') starting migration to
>> qemu+tls://lago-basic-suite-4-2-host-1/system with miguri
>> tcp://lago-basic-suite-4-2-host-1 (migration:502)
>> 2018-09-10 05:27:25,044-0400 ERROR (periodic/0) [virt.periodic.Operation]
>> <vdsm.virt.sampling.HostMonitor object at 0x7f0ddfc2b190> operation
>> failed (periodic:222)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line
>> 220, in __call__
>>     self._func()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>> 576, in __call__
>>     sample = HostSample(self._pid)
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>> 240, in __init__
>>     self.interfaces = _get_interfaces_and_samples()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>> 202, in _get_interfaces_and_samples
>>     links_and_samples[link.name] = InterfaceSample(link)
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>> 109, in __init__
>>     self.speed = _getLinkSpeed(link)
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>> 590, in _getLinkSpeed
>>     speed = vlan.speed(dev.name)
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
>> line 35, in speed
>>     dev_speed = nic.read_speed_using_sysfs(dev_name)
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line
>> 47, in read_speed_using_sysfs
>>     s = int(f.read())
>> IOError: [Errno 22] Invalid argument
>> 2018-09-10 05:27:25,499-0400 INFO  (libvirt/events) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') CPU stopped: onSuspend
>> (vm:6203)
>> 2018-09-10 05:27:27,167-0400 INFO  (migsrc/9d5e4d63) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') migration took 7 seconds
>> to complete (migration:514)
>> 2018-09-10 05:27:27,168-0400 INFO  (migsrc/9d5e4d63) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Changed state to Down:
>> Migration succeeded (code=4) (vm:1693)
>> 2018-09-10 05:27:27,176-0400 INFO  (migsrc/9d5e4d63) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection
>> (guestagent:438)
>> 2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [api.virt] START
>> destroy(gracefulAttempts=1) from=::ffff:192.168.200.2,36202,
>> vmId=9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6 (api:46)
>> 2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Release VM resources
>> (vm:5283)
>> 2018-09-10 05:27:27,183-0400 WARN  (jsonrpc/6) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') trying to set state to
>> Powering down when already Down (vm:612)
>> 2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection
>> (guestagent:438)
>> 2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') _destroyVmGraceful attempt
>> #0 (vm:5320)
>> 2018-09-10 05:27:27,184-0400 WARN  (jsonrpc/6) [virt.vm]
>> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') VM
>> '9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6' couldn't be destroyed in libvirt:
>> Requested operation is not valid: domain is not running (vm
>> :5341)
>> 2018-09-10 05:27:27,185-0400 INFO  (jsonrpc/6) [vdsm.api] START
>> teardownImage(sdUUID='8cbc7539-1ab3-4a3e-9504-65b5002c6546',
>> spUUID='3b4effaf-6bad-42b8-810b-80faaa5d5a1e',
>> imgUUID='90402f78-6c84-44cf-bcdc-7823f252d185', volUUID=None) from
>> =::ffff:192.168.200.2,36202, task_id=73c097e0-6cb6-449e-bebe-f0dfa55e2155
>> (api:46)
>> 2018-09-10 05:27:27,186-0400 INFO  (jsonrpc/6) [storage.StorageDomain]
>> Removing image run directory '/var/run/vdsm/storage/8cbc753
>> 9-1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185'
>> (blockSD:1360)
>> 2018-09-10 05:27:27,186-0400 INFO  (jsonrpc/6) [storage.fileUtils]
>> Removing directory: /var/run/vdsm/storage/8cbc7539
>> -1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185
>> (fileUtils:178)
>> 2018-09-10 05:27:27,325-0400 INFO  (jsonrpc/6) [storage.LVM] Deactivating
>> lvs: vg=8cbc7539-1ab3-4a3e-9504-65b5002c6546
>> lvs=['715b9383-87ec-4381-aaff-f084df42b3ce',
>> 'ba23b2ac-90db-41b7-ab0f-c49f0c4d1b1e', '9201288a-918b-41a8-8ff2-66438
>> 33e1
>> d91'] (lvm:1310)
>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/
>> 3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4
>> .2/post-006_migrations.py/
>>
>> On Mon, Sep 10, 2018 at 11:03 AM, Dafna Ron <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> we had a failure to deploy hosts for change;
>>> https://gerrit.ovirt.org/#/c/94228/1
>>>
>>> I don't think its related directly to the change but anything between
>>> packaging: use system provided nimbus-jose-jwt also on el7 patch and
>>> current patch could have caused the failure.
>>>
>>> https://gerrit.ovirt.org/#/q/project:ovirt-engine+status:Mer
>>> ged+branch:master
>>>
>>> Sandro, Ondra, can you please have a look?
>>>
>>> (Relevant) error snippet from the log:
>>>
>>> <error>
>>>
>>> 018-09-10 04:05:33,850-04 INFO  
>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Installation of
>>> lago-upgrade-from-release-suite-master-host-0. Executing command via
>>> SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm
>>> -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C
>>> "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>> DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-
>>> host-deploy.tar
>>> 2018-09-10 04:05:33,850-04 INFO  
>>> [org.ovirt.engine.core.utils.archivers.tar.CachedTar]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Tarball
>>> '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh
>>> 2018-09-10 04:05:33,915-04 INFO  
>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] SSH execute
>>> 'root@lago-upgrade-from-release-suite-master-host-0' 'umask 0077;
>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap
>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" >
>>> /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C "${MYTMP}" -x &&
>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>> DIALOG/customization=bool:True'
>>> 2018-09-10 04:05:34,694-04 INFO  [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (VdsDeploy) [33bb53b]
>>> EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host
>>> lago-upgrade-from-release-suite-master-host-0. Stage: Initializing.
>>> 2018-09-10 04:05:34,710-04 INFO  
>>> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
>>> (VdsDeploy) [33bb53b] transaction rolled back
>>> 2018-09-10 04:05:34,719-04 ERROR 
>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>> (VdsDeploy) [33bb53b] Error during deploy dialog
>>> 2018-09-10 04:05:34,723-04 ERROR 
>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host
>>> lago-upgrade-from-release-suite-master-host-0 install
>>> 2018-09-10 04:05:34,736-04 ERROR [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] 
>>> (EE-ManagedThreadFactory-engine-Thread-1)
>>> [33bb53b] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has
>>> occurred during installation of Host 
>>> lago-upgrade-from-release-suite-master-host-0:
>>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested
>>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte
>>> sequence for encoding "UTF8": 0x00.
>>> 2018-09-10 04:05:34,737-04 ERROR 
>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host
>>> lago-upgrade-from-release-suite-master-host-0 install, preferring first
>>> exception: CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>>> ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00;
>>> nested exception is org.postgresql.util.PSQLException: ERROR: invalid
>>> byte sequence for encoding "UTF8": 0x00
>>> 2018-09-10 04:05:34,737-04 ERROR [org.ovirt.engine.core.bll.hos
>>> tdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1)
>>> [33bb53b] Host installation failed for host 
>>> 'febee66d-fa4b-4321-9c99-cb053ddd4817',
>>> 'lago-upgrade-from-release-suite-master-host-0':
>>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested
>>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte
>>> sequence for encoding "UTF8": 0x00
>>> 2018-09-10 04:05:34,760-04 INFO  
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] START,
>>> SetVdsStatusVDSCommand(HostName = 
>>> lago-upgrade-from-release-suite-master-host-0,
>>> SetVdsStatusVDSCommandParameters:{hostId='febee66d-fa4b-4321-9c99-cb053ddd4817',
>>> status='InstallFailed', nonOperationalReason='NONE',
>>> stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 793ad1ed
>>> 2018-09-10 04:05:34,769-04 INFO  
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] FINISH,
>>> SetVdsStatusVDSCommand, return: , log id: 793ad1ed
>>> 2018-09-10 04:05:34,781-04 ERROR [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] 
>>> (EE-ManagedThreadFactory-engine-Thread-1)
>>> [33bb53b] EVENT_ID: VDS_INSTALL_FAILED(505), Host
>>> lago-upgrade-from-release-suite-master-host-0 installation failed.
>>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested
>>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte
>>> sequence for encoding "UTF8": 0x00.
>>> 2018-09-10 04:05:34,794-04 INFO  [org.ovirt.engine.core.bll.hos
>>> tdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1)
>>> [33bb53b] Lock freed to object 'EngineLock:{exclusiveLocks='[
>>> febee66d-fa4b-4321-9c99-cb053ddd4817=VDS]', sharedLocks=''}'
>>> 2018-09-10 04:06:10,003-04 INFO  
>>> [org.ovirt.engine.core.bll.quota.QuotaManager]
>>> (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] Quota Cache
>>> updated. (20 msec)
>>> (END)
>>>
>>>
>>> </error>
>>>
>>>
>>
>> _______________________________________________
>> Infra mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: https://www.ovirt.org/communit
>> y/about/community-guidelines/
>> List Archives: https://lists.ovirt.org/archiv
>> es/list/[email protected]/message/TZGEPC56JYKKCOZ5XODOJMVLBR7MI6KG/
>>
>>
>
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/3WXKTZ5VPQ4Q4BEFQ7TFFOO3Q6JLBYNM/

Reply via email to