Thanks Dominic. Can you please review the network tests to fix this race between the two tests?
On Tue, Sep 11, 2018 at 9:53 PM, Dominik Holler <[email protected]> wrote: > On Tue, 11 Sep 2018 17:18:15 +0200 > Dominik Holler <[email protected]> wrote: > > > On Tue, 11 Sep 2018 12:22:21 +0100 > > Dafna Ron <[email protected]> wrote: > > > > > Can someone take ownership to fix it? > > > > > > > I will take ownership and track the fix. > > > > > > > On Tue, Sep 11, 2018 at 12:04 PM, Petr Horacek <[email protected]> > > > wrote: > > > > > > > vdsm.log is the same issue, supervdsm.log seems unrelated. > > > > > > > > 2018-09-11 11:59 GMT+02:00 Dominik Holler <[email protected]>: > > > > > > > >> On Tue, 11 Sep 2018 10:26:02 +0100 > > > >> Dafna Ron <[email protected]> wrote: > > > >> > > > >> > Hi, > > > >> > > > > >> > I have been seeing random failures of tests in different > > > >> > projects caused by vdsm monitoring. > > > >> > > > > >> > I need someone from vdsm to please help debug this issue. > > > >> > > > > >> > > > >> Petr, is this the same problem like yesterday in > > > >> > > > >> [ovirt-devel] failure in ost test - Invalid argument - help in > > > >> debugging issue > > > >> > > > >> > From what I can see, the test suspend/resume vm failed because > > > >> > we could not query the status of the vm on the host. > > > >> > > > > >> > you can see full log from failed tests here: > > > >> > > > > >> > https://jenkins.ovirt.org/job/ovirt-master_change-queue-test > > > >> er/10208/artifact/basic-suite.el7.x86_64/test_logs/basic- > > > >> suite-master/post-004_basic_sanity.py/ > > > >> > > > > >> > Here are the errors that I can see in the vdsm which seem to > > > >> > suggest that there is an issue getting stats on the vm which > > > >> > was suspended and caused the failure of the test: > > > >> > > > > >> > > > > >> > > > > >> > 2018-09-11 00:40:05,896-0400 INFO (monitor/c1fe6e6) > > > >> > [storage.StorageDomain] Removing remnants of deleted images [] > > > >> > (fileSD:734) 2018-09-11 00:40:07,957-0400 DEBUG (qgapoller/1) > > > >> > [vds] Not sending QEMU-GA command 'guest-get-users' to > > > >> > vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not > > > >> > supported (qemuguestagent:192) 2018-09-11 00:40:08,068-0400 > > > >> > DEBUG (periodic/3) [virt.sampling.VMBulkstatsMonitor] sampled > > > >> > timestamp 4296118.49 elapsed 0.010 acquired True domains all > > > >> > (sampling:443) 2018-09-11 00:40:08,271-0400 DEBUG (jsonrpc/1) > > > >> > [jsonrpc.JsonRpcServer] Calling 'Image.prepare' in bridge with > > > >> > {u'allowIllegal': True, u'storagepoolID': > > > >> > u'e80a56d9-74da-498a-b010-4a9df287f11d', u'imageID': > > > >> > u'd4c831e6-02d2-4d89-b516-0ec4597 5e024', u'volumeID': > > > >> > u'15b07af1-625b-42e3-b62a-8e7c7a120a56', u'storagedomainID': > > > >> > u'f1744940-41b6-4d35-b7bf-870c4e07d995'} (__init__:329) > > > >> > > > > >> > > > > >> > 2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm] > > > >> > (vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest connection > > > >> > timed out (guestagent:556) > > > >> > 2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5) > > > >> > [jsonrpc.JsonRpcServer] Calling 'Host.getStats' in bridge with > > > >> > {} (__init__:329) 2018-09-11 00:40:11,637-0400 INFO > > > >> > (jsonrpc/5) [api.host] START getStats() > > > >> > from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11 > > > >> > 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0 > > > >> > speed (nic:42) 2018-09-11 00:40:11,645-0400 DEBUG (jsonrpc/5) > > > >> > [root] cannot read eth1 speed (nic:42) > > > >> > 2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root] cannot > > > >> > read eth2 speed (nic:42) > > > >> > 2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root] cannot > > > >> > read eth3 speed (nic:42) > > > >> > 2018-09-11 00:40:11,667-0400 INFO (jsonrpc/5) [api.host] > > > >> > FINISH getStats error=[Errno 22] Invalid argument > > > >> > from=::ffff:192.168.201.4,49184 (api:51) 2018-09-11 > > > >> > 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer] > > > >> > Internal server error (__init__:350) Traceback (most recent > > > >> > call last): File > > > >> > "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line > > > >> > 345, in _handle_request res = method(**params) > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", > > > >> > line 202, in _dynamicMethod > > > >> > result = fn(*methodArgs) > > > >> > File "<string>", line 2, in getStats > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", > > > >> > line 49, in method > > > >> > ret = func(*args, **kwargs) > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/API.py", line > > > >> > 1407, in getStats > > > >> > multipath=True)} > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", > > > >> > line 49, in get_stats > > > >> > decStats = stats.produce(first_sample, last_sample) > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", > > > >> > line 71, in produce > > > >> > stats.update(get_interfaces_stats()) > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", > > > >> > line 153, in get_interfaces_stats > > > >> > return net_api.network_stats() > > > >> > File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", > > > >> > line 63, in network_stats > > > >> > return netstats.report() > > > >> > File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", > > > >> > line 31, in report stats = link_stats.report() > > > >> > File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", > > > >> > line 41, in report speed = vlan.speed(i.device) > > > >> > File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", > > > >> > line 36, in speed dev_speed = > > > >> > nic.read_speed_using_sysfs(dev_name) File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", > > > >> > line 48, in read_speed_using_sysfs s = int(f.read()) > > > >> > IOError: [Errno 22] Invalid argument > > > >> > 2018-09-11 00:40:11,669-0400 INFO (jsonrpc/5) > > > >> > [jsonrpc.JsonRpcServer] RPC call Host.getStats failed (error > > > >> > -32603) in 0.03 seconds (__init__:312) 2018-09-11 > > > >> > 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] > > > >> > Calling 'Host.getAllVmStats' in bridge with {} (__init__:329) > > > >> > 2018-09-11 00:40:14,657-0400 INFO (jsonrpc/2) [api.host] START > > > >> > getAllVmStats() from=::1,46772 (api:47) 2018-09-11 > > > >> > 00:40:14,659-0400 INFO (jsonrpc/2) [api.host] FINISH > > > >> > getAllVmStats return={'status': {'message': 'Done', 'code': 0}, > > > >> > 'statsList': (suppressed)} from=::1,46772 (api:53) 2018-09-11 > > > >> > 00:40:14,660-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] > > > >> > Return 'Host.getAllVmStats' in bridge with (suppressed) > > > >> > (__init__:356) 2018-09-11 00:40:14,661-0400 INFO (jsonrpc/2) > > > >> > [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded > > > >> > in 0.00 seconds (__init__:312) 2018-09-11 00:40:14,674-0400 > > > >> > DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling > > > >> > 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:329) > > > >> > 2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] START > > > >> > getAllVmIoTunePolicies() from=::1,46772 (api:47) 2018-09-11 > > > >> > 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] FINISH > > > >> > getAllVmIoTunePolicies return={'status': {'message': 'Done', > > > >> > 'code': 0}, 'io_tune_policies_dict': > > > >> > {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], > > > >> > 'current_values': [{'ioTune': {'write_bytes_sec': 0L, > > > >> > 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': > > > >> > 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': > > > >> > '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf- > > > >> 870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/ > > > >> e2570635-a798-4e9e-a0ea-88436d5b296f', > > > >> > 'name': 'vda'}]}}} from=::1,46772 (api:53) 2018-09-11 > > > >> > 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] > > > >> > Return 'Host.getAllVmIoTunePolicies' in bridge with > > > >> > {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], > > > >> > 'current_values': [{'ioTune': {'write_bytes_sec': 0L, > > > >> > 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': > > > >> > 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': > > > >> > '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf- > > > >> 870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/ > > > >> e2570635-a798-4e9e-a0ea-88436d5b296f', > > > >> > 'name': 'vda'}]}} (__init__:356) 2018-09-11 00:40:14,675-0400 > > > >> > INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call > > > >> > Host.getAllVmIoTunePolicies succeeded in 0.00 seconds > > > >> > (__init__:312) 2018-09-11 00:40:16,118-0400 DEBUG (jsonrpc/7) > > > >> > [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmStats' in bridge > > > >> > with {} (__init__:329) 2018-09-11 00:40:16,119-0400 INFO > > > >> > (jsonrpc/7) [api.host] START getAllVmStats() > > > >> > from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11 > > > >> > 00:40:16,121-0400 INFO (jsonrpc/7) [api.host] FINISH > > > >> > getAllVmStats return={'status': {'message': 'Done', 'code': 0}, > > > >> > 'statsList': (suppressed)} from=::ffff:192.168.201.4,49184 > > > >> > (api:53) 2018-09-11 00:40:16,122-0400 DEBUG (jsonrpc/7) > > > >> > [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats' in bridge > > > >> > with (suppressed) (__init__:356) 2018-09-11 00:40:16,122-0400 > > > >> > INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call > > > >> > Host.getAllVmStats succeeded in 0.01 seconds (__init__:312) > > > >> > 2018-09-11 00:40:17,956-0400 DEBUG (qgapoller/2) [vds] Not > > > >> > sending QEMU-GA command 'guest-get-users' to > > > >> > vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not > > > >> > supported (qemuguestagent:192) 2018-09-11 00:40:18,025-0400 > > > >> > DEBUG (periodic/3) [virt.periodic] Looking for stale paused VMs > > > >> > (periodic:388) > > > >> > > > > >> > I can also see this in Super vdsm which may be related but I an > > > >> > not 100% sure its related to current issue: > > > >> > > > > >> > MainProcess|jsonrpc/2::DEBUG::2018-09-11 > > > >> > 00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS: <err> = > > > >> > ''; <rc> = 0 ifup/on039d2d88d8ee4::DEBUG::2018-09-11 > > > >> > 00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED: <err> = > > > >> > 'Running scope as unit > > > >> > 2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/ > > > >> network-scripts/ifup-eth: > > > >> > line 321: 20715 Terminated > > > >> > /sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nCannot find device > > > >> > "on039d2d88d8ee4"\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d 8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevic e "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d 8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevic e "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d 8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic e > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" > > > >> > does not exist.\nDevice "on039d2d88d8ee4" does not > > > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice > > > >> > "on039d2d88d8ee4" does not exist.\n'; <rc> = 1 > > > >> > ifup/on039d2d88d8ee4::ERROR::2018-09-11 > > > >> > 00:33:20,238::concurrent::202::root::(run) FINISH thread > > > >> > <Thread(ifup/on039d2d88d8ee4, stopped daemon 140281550731008)> > > > >> > failed Traceback (most recent call last): File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", > > > >> > line 195, in run ret = func(*args, **kwargs) File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/ > configurators/ifcfg.py", > > > >> > line 948, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup) > > > >> > File > > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/ > configurators/ifcfg.py", > > > >> > line 934, in _exec_ifup_by_name > > > >> > raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out > > > >> > else '') ConfigNetworkError: (29, '\n') > > > >> > > > > >> > > > > What happens in > basic-suite-master/test-scenarios/003_basic_networking.py : > modify_host_0_ip_to_dhcp set VM_NETWORK to use DHCP, > so vdsm is calling ifup, which tries to configure via DHCP in the > background. > The next test detach_vm_network_from_host_0 removes the network, while > ifup still hangs in dhcp configuration, because in this VLAN there is > no DHCP server. > So from my point of view this error in supervdsm.log is legitimate, > because VDSM was advised to configure via DHCP, which fails because the > interface is removed during dhcp configuration. > I did not recognize that this triggeres any chained errors. > > > > >> > thanks, > > > >> > Dafna > > > >> > > > >> > > > > > > > >
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/GFPBLM2Y4U6EQQWNEMYZWB2Y4JMPRMZQ/
