On Tue, Sep 11, 2018 at 5:28 PM, Milan Zamazal <[email protected]> wrote: > [I'm not sure whether the topic is still relevant now, after I returned > from PTO.] > > Yedidyah Bar David <[email protected]> writes: > >> On Thu, Sep 6, 2018 at 4:08 AM, <[email protected]> wrote: >>> Project: >>> http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/ >>> Build: >>> http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/ >> >>> Build Number: 137 >>> Build Status: Still Failing >>> Triggered By: Started by timer >>> >>> ------------------------------------- >>> Changes Since Last Success: >>> ------------------------------------- >>> Changes for Build #107 >>> [Roy Golan] Introduce openshift-on-ovirt >>> >>> >>> Changes for Build #108 >>> [Roy Golan] Introduce openshift-on-ovirt >>> >>> >>> Changes for Build #109 >>> [Milan Zamazal] ovf_import test >>> >>> [Barak Korren] global_setup: Only install release RPMs on CentOS >>> >>> [Barak Korren] mock_runner: Set no_proxy for proxied configs >>> >>> >>> Changes for Build #110 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #111 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #112 >>> [Milan Zamazal] ovf_import test >>> >>> [Eyal Edri] adding kubevirt/containerized-data-importer to stdci v2 >>> >>> >>> Changes for Build #113 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #114 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #115 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #116 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #117 >>> [Milan Zamazal] ovf_import test >>> >>> >>> Changes for Build #118 >>> [Milan Zamazal] ovf_import test >>> >>> [Daniel Belenky] Remove oVirt logo from build summary >>> >>> >>> Changes for Build #119 >>> [Milan Zamazal] ovf_import test >>> >>> [Evgheni Dereveanchin] Add infra-puppet project >>> >>> [Daniel Belenky] Added production gerrit to push-url-map >>> >>> >>> Changes for Build #120 >>> [Ondra Machacek] ansible_suite: Fix ansible suite modules names >>> >>> >>> Changes for Build #121 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> >>> Changes for Build #122 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> >>> Changes for Build #123 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> >>> Changes for Build #124 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> >>> Changes for Build #125 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> >>> Changes for Build #126 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> >>> Changes for Build #127 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> [Viktor Mihajlovski] Re-enable s390x jobs >>> >>> >>> Changes for Build #128 >>> [Anton Marchukov] Removed /root/fluentd from collected artifacts. >>> >>> [Ehud Yonasi] docker_cleanup: Fixed race condition with containers >>> >>> >>> Changes for Build #129 >>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2 >>> >>> [Sandro Bonazzola] system-tests: drop sbonazzo from notified list >>> >>> >>> Changes for Build #130 >>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2 >>> >>> >>> Changes for Build #131 >>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2 >>> >>> >>> Changes for Build #132 >>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2 >>> >>> >>> Changes for Build #133 >>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2 >>> >>> [Daniel Belenky] stdci_runner: increase timeout to 3 hours >>> >>> >>> Changes for Build #134 >>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2 >>> >>> >>> Changes for Build #135 >>> [Galit] Updated ovirt 4.2 repo file >>> >>> >>> Changes for Build #136 >>> [Gal Ben Haim] Adding dr suite >>> >>> >>> Changes for Build #137 >>> [Gal Ben Haim] Adding dr suite >>> >>> [Barak Korren] Remove the populate_mock function from mock_runner >>> >>> [Daniel Belenky] mock_runner: store shell cmd in a variable >>> >>> [Barak Korren] mock_runner: Added timeout param >>> >>> [Barak Korren] Make whitelist repo configurable via env vars >>> >>> [Daniel Belenky] stdci_runner: let mock_runner manage timeout >>> >>> [Greg Sheremeta] remove 4.1 change queue from ovirt-engine-nodejs config >>> >>> >>> >>> >>> ----------------- >>> Failed Tests: >>> ----------------- >>> 1 tests failed. >>> FAILED: 004_basic_sanity.verify_suspend_resume_vm0_40 >>> >>> Error Message: >>> 0 != 1 >>> -------------------- >> begin captured logging << -------------------- >>> lago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh >>> client for lago-compat-4-0-suite-master-host-0: >>> lago.ssh: DEBUG: end task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh >>> client >>> for lago-compat-4-0-suite-master-host-0: >>> lago.ssh: DEBUG: Running 414b4eae on lago-compat-4-0-suite-master-host-0: >>> awk >>> 'BEGIN { n = 0; } $1 + " " + $2 > "IOError: [Errno" && $0 ~ /START >>> create\(.*hiberVolHandle/ { n = n + 1; } END { print n; }' >>> /var/log/vdsm/vdsm.log >>> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 >>> returned with 0 >>> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 >>> output: >>> 0 >> >> This now happened to me too, when running OST manually on my machine, >> basic suite master. >> >> vdsm.log [1] has many occurrences of things like: >> >> 2018-09-05 21:02:49,276-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] >> Calling 'Host.getStats' in bridge with {} (__init__:329) >> 2018-09-05 21:02:49,277-0400 INFO (jsonrpc/5) [api.host] START >> getStats() from=::ffff:192.168.200.2,40112 (api:47) >> 2018-09-05 21:02:49,281-0400 DEBUG (jsonrpc/5) [root] cannot read eth0 >> speed (nic:42) >> 2018-09-05 21:02:49,282-0400 DEBUG (jsonrpc/5) [root] cannot read eth1 >> speed (nic:42) >> 2018-09-05 21:02:49,283-0400 DEBUG (jsonrpc/5) [root] cannot read eth2 >> speed (nic:42) >> 2018-09-05 21:02:49,284-0400 DEBUG (jsonrpc/5) [root] cannot read eth3 >> speed (nic:42) >> 2018-09-05 21:02:49,289-0400 INFO (jsonrpc/5) [api.host] FINISH >> getStats error=[Errno 22] Invalid argument >> from=::ffff:192.168.200.2,40112 (api:51) >> 2018-09-05 21:02:49,289-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer] >> Internal server error (__init__:350) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >> 345, in _handle_request >> res = method(**params) >> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >> 202, in _dynamicMethod >> result = fn(*methodArgs) >> File "<string>", line 2, in getStats >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in >> method >> ret = func(*args, **kwargs) >> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in getStats >> multipath=True)} >> File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49, >> in get_stats >> decStats = stats.produce(first_sample, last_sample) >> File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71, >> in produce >> stats.update(get_interfaces_stats()) >> File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line >> 153, in get_interfaces_stats >> return net_api.network_stats() >> File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line >> 63, in network_stats >> return netstats.report() >> File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", >> line 31, in report >> stats = link_stats.report() >> File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", >> line 41, in report >> speed = vlan.speed(i.device) >> File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", >> line 36, in speed >> dev_speed = nic.read_speed_using_sysfs(dev_name) >> File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", >> line 48, in read_speed_using_sysfs >> s = int(f.read()) >> IOError: [Errno 22] Invalid argument >> >> The test verify_suspend_resume_vm0 seems to try to find a line in >> vdsm.log that appears _after_ the place it started suspend_resume_vm0. >> To search for this line, suspend_resume_vm0 does 'tail -1' on the file >> and keeps the timestamp (first two words). But with tracebacks such as >> above, instead of a timestamp it gets "IOError: [Errno". >> >> Not sure how to handle. Some options I can think of: >> >> 1. Make vdsm add a timestamp prefix to such tracebacks (and to >> everything, perhaps). This will fix the problem, and might be useful >> on its own, but can also be annoying. > > I think it would be more annoying than useful. The initial timestamp > applies to the whole traceback so there would be no added information, > just making it possibly better readable to some parsers and less > readable to humans.
Agreed. > >> 2. Change the logic in the test somehow, e.g.: >> - instead of 'tail -1', take e.g. 'tail -100' and search for the last >> line that has a timestamp > > That would be possible, but I actually like the fact that those > tracebacks were detected, although by side effect. If there's something > unexpected in the log, let's fail. OK, see below. > >> - check the file size and on the next one continue starting from this >> point (assuming OST will not rotate vdsm.log, is that safe?) >> - rotate vdsm.log prior to running the test (might be problematic, no >> idea) and then check whole new file >> >> 3. Perhaps patch vdsm to not log the traceback, e.g. >> https://gerrit.ovirt.org/94197 > > Yes, fixing the problem is the right solution, there shouldn't be > tracebacks in the log without (good) reason. I wasn't aware of the fact (?) that this is indeed the policy. ( As a side note, in at least engine-setup, we do log tracebacks in some cases and consider this ok. See also this, especially the last few paragraphs: https://ovirt.org/develop/developer-guide/engine/otopi/#logs ) > I think Petr Horáček is > working on it. Very well. Cced you and him on above patch. If his fix is expected soon, I don't mind abandoning mine (above). > >> Probably all of these can be done as well. >> >> [1] >> https://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/artifact/exported-artifacts/test_logs/compat-4.0-suite-master/post-004_basic_sanity.py/lago-compat-4-0-suite-master-host-0/_var_log/vdsm/vdsm.log >> >>> >>> --------------------- >> end captured logging << --------------------- >>> >>> Stack Trace: >>> File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >>> testMethod() >>> File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest >>> self.test(*self.arg) >>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, >>> in wrapped_test >>> test() >>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in >>> wrapper >>> return func(get_test_prefix(), *args, **kwargs) >>> File >>> "/home/jenkins/workspace/ovirt-system-tests_compat-4.0-suite-master/ovirt-system-tests/compat-4.0-suite-master/test-scenarios/004_basic_sanity.py", >>> line 1140, in verify_suspend_resume_vm0 >>> nt.assert_equals(log_line_count('START create\(.*' + identifier), 1) >>> File "/usr/lib64/python2.7/unittest/case.py", line 553, in assertEqual >>> assertion_func(first, second, msg=msg) >>> File "/usr/lib64/python2.7/unittest/case.py", line 546, in >>> _baseAssertEqual >>> raise self.failureException(msg) >>> '0 != 1\n-------------------- >> begin captured logging << >>> --------------------\nlago.ssh: DEBUG: start >>> task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for >>> lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: end >>> task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for >>> lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: Running 414b4eae on >>> lago-compat-4-0-suite-master-host-0: awk \'BEGIN { n = 0; } $1 + " " + $2 > >>> "IOError: [Errno" && $0 ~ /START create\\(.*hiberVolHandle/ { n = n + 1; } >>> END { print n; }\' /var/log/vdsm/vdsm.log\nlago.ssh: DEBUG: Command >>> 414b4eae on lago-compat-4-0-suite-master-host-0 returned with 0\nlago.ssh: >>> DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 output:\n >>> 0\n\n--------------------- >> end captured logging << ---------------------' >>> _______________________________________________ >>> Infra mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/[email protected]/message/PTTD52JZ7G72DXURL3RAPBGVMKKASGHH/ >>> -- Didi _______________________________________________ Infra mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/V2EPZY6CMU4TQNTJQB5PHGVNE4BSA53L/
