On Tue, Sep 11, 2018 at 5:28 PM, Milan Zamazal <[email protected]> wrote:
> [I'm not sure whether the topic is still relevant now, after I returned
> from PTO.]
>
> Yedidyah Bar David <[email protected]> writes:
>
>> On Thu, Sep 6, 2018 at 4:08 AM,  <[email protected]> wrote:
>>> Project: 
>>> http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/
>>> Build: 
>>> http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/
>>
>>> Build Number: 137
>>> Build Status:  Still Failing
>>> Triggered By: Started by timer
>>>
>>> -------------------------------------
>>> Changes Since Last Success:
>>> -------------------------------------
>>> Changes for Build #107
>>> [Roy Golan] Introduce openshift-on-ovirt
>>>
>>>
>>> Changes for Build #108
>>> [Roy Golan] Introduce openshift-on-ovirt
>>>
>>>
>>> Changes for Build #109
>>> [Milan Zamazal] ovf_import test
>>>
>>> [Barak Korren] global_setup: Only install release RPMs on CentOS
>>>
>>> [Barak Korren] mock_runner: Set no_proxy for proxied configs
>>>
>>>
>>> Changes for Build #110
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #111
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #112
>>> [Milan Zamazal] ovf_import test
>>>
>>> [Eyal Edri] adding kubevirt/containerized-data-importer to stdci v2
>>>
>>>
>>> Changes for Build #113
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #114
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #115
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #116
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #117
>>> [Milan Zamazal] ovf_import test
>>>
>>>
>>> Changes for Build #118
>>> [Milan Zamazal] ovf_import test
>>>
>>> [Daniel Belenky] Remove oVirt logo from build summary
>>>
>>>
>>> Changes for Build #119
>>> [Milan Zamazal] ovf_import test
>>>
>>> [Evgheni Dereveanchin] Add infra-puppet project
>>>
>>> [Daniel Belenky] Added production gerrit to push-url-map
>>>
>>>
>>> Changes for Build #120
>>> [Ondra Machacek] ansible_suite: Fix ansible suite modules names
>>>
>>>
>>> Changes for Build #121
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>>
>>> Changes for Build #122
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>>
>>> Changes for Build #123
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>>
>>> Changes for Build #124
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>>
>>> Changes for Build #125
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>>
>>> Changes for Build #126
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>>
>>> Changes for Build #127
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>> [Viktor Mihajlovski] Re-enable s390x jobs
>>>
>>>
>>> Changes for Build #128
>>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>>
>>> [Ehud Yonasi] docker_cleanup: Fixed race condition with containers
>>>
>>>
>>> Changes for Build #129
>>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>>
>>> [Sandro Bonazzola] system-tests: drop sbonazzo from notified list
>>>
>>>
>>> Changes for Build #130
>>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>>
>>>
>>> Changes for Build #131
>>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>>
>>>
>>> Changes for Build #132
>>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>>
>>>
>>> Changes for Build #133
>>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>>
>>> [Daniel Belenky] stdci_runner: increase timeout to 3 hours
>>>
>>>
>>> Changes for Build #134
>>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>>
>>>
>>> Changes for Build #135
>>> [Galit] Updated ovirt 4.2 repo file
>>>
>>>
>>> Changes for Build #136
>>> [Gal Ben Haim] Adding dr suite
>>>
>>>
>>> Changes for Build #137
>>> [Gal Ben Haim] Adding dr suite
>>>
>>> [Barak Korren] Remove the populate_mock function from mock_runner
>>>
>>> [Daniel Belenky] mock_runner: store shell cmd in a variable
>>>
>>> [Barak Korren] mock_runner: Added timeout param
>>>
>>> [Barak Korren] Make whitelist repo configurable via env vars
>>>
>>> [Daniel Belenky] stdci_runner: let mock_runner manage timeout
>>>
>>> [Greg Sheremeta] remove 4.1 change queue from ovirt-engine-nodejs config
>>>
>>>
>>>
>>>
>>> -----------------
>>> Failed Tests:
>>> -----------------
>>> 1 tests failed.
>>> FAILED:  004_basic_sanity.verify_suspend_resume_vm0_40
>>>
>>> Error Message:
>>> 0 != 1
>>> -------------------- >> begin captured logging << --------------------
>>> lago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh
>>> client for lago-compat-4-0-suite-master-host-0:
>>> lago.ssh: DEBUG: end task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh 
>>> client
>>> for lago-compat-4-0-suite-master-host-0:
>>> lago.ssh: DEBUG: Running 414b4eae on lago-compat-4-0-suite-master-host-0: 
>>> awk
>>> 'BEGIN { n = 0; } $1 + " " + $2 > "IOError: [Errno" && $0 ~ /START
>>> create\(.*hiberVolHandle/ { n = n + 1; } END { print n; }'
>>> /var/log/vdsm/vdsm.log
>>> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 
>>> returned with 0
>>> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 
>>> output:
>>>  0
>>
>> This now happened to me too, when running OST manually on my machine,
>> basic suite master.
>>
>> vdsm.log [1] has many occurrences of things like:
>>
>> 2018-09-05 21:02:49,276-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer]
>> Calling 'Host.getStats' in bridge with {} (__init__:329)
>> 2018-09-05 21:02:49,277-0400 INFO  (jsonrpc/5) [api.host] START
>> getStats() from=::ffff:192.168.200.2,40112 (api:47)
>> 2018-09-05 21:02:49,281-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
>> speed (nic:42)
>> 2018-09-05 21:02:49,282-0400 DEBUG (jsonrpc/5) [root] cannot read eth1
>> speed (nic:42)
>> 2018-09-05 21:02:49,283-0400 DEBUG (jsonrpc/5) [root] cannot read eth2
>> speed (nic:42)
>> 2018-09-05 21:02:49,284-0400 DEBUG (jsonrpc/5) [root] cannot read eth3
>> speed (nic:42)
>> 2018-09-05 21:02:49,289-0400 INFO  (jsonrpc/5) [api.host] FINISH
>> getStats error=[Errno 22] Invalid argument
>> from=::ffff:192.168.200.2,40112 (api:51)
>> 2018-09-05 21:02:49,289-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
>> Internal server error (__init__:350)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>> 345, in _handle_request
>>     res = method(**params)
>>   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>> 202, in _dynamicMethod
>>     result = fn(*methodArgs)
>>   File "<string>", line 2, in getStats
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in 
>> method
>>     ret = func(*args, **kwargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in getStats
>>     multipath=True)}
>>   File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49,
>> in get_stats
>>     decStats = stats.produce(first_sample, last_sample)
>>   File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71,
>> in produce
>>     stats.update(get_interfaces_stats())
>>   File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line
>> 153, in get_interfaces_stats
>>     return net_api.network_stats()
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line
>> 63, in network_stats
>>     return netstats.report()
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py",
>> line 31, in report
>>     stats = link_stats.report()
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py",
>> line 41, in report
>>     speed = vlan.speed(i.device)
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
>> line 36, in speed
>>     dev_speed = nic.read_speed_using_sysfs(dev_name)
>>   File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py",
>> line 48, in read_speed_using_sysfs
>>     s = int(f.read())
>> IOError: [Errno 22] Invalid argument
>>
>> The test verify_suspend_resume_vm0 seems to try to find a line in
>> vdsm.log that appears _after_ the place it started suspend_resume_vm0.
>> To search for this line, suspend_resume_vm0 does 'tail -1' on the file
>> and keeps the timestamp (first two words). But with tracebacks such as
>> above, instead of a timestamp it gets "IOError: [Errno".
>>
>> Not sure how to handle. Some options I can think of:
>>
>> 1. Make vdsm add a timestamp prefix to such tracebacks (and to
>> everything, perhaps). This will fix the problem, and might be useful
>> on its own, but can also be annoying.
>
> I think it would be more annoying than useful.  The initial timestamp
> applies to the whole traceback so there would be no added information,
> just making it possibly better readable to some parsers and less
> readable to humans.

Agreed.

>
>> 2. Change the logic in the test somehow, e.g.:
>> - instead of 'tail -1', take e.g. 'tail -100' and search for the last
>> line that has a timestamp
>
> That would be possible, but I actually like the fact that those
> tracebacks were detected, although by side effect.  If there's something
> unexpected in the log, let's fail.

OK, see below.

>
>> - check the file size and on the next one continue starting from this
>> point (assuming OST will not rotate vdsm.log, is that safe?)
>> - rotate vdsm.log prior to running the test (might be problematic, no
>> idea) and then check whole new file
>>
>> 3. Perhaps patch vdsm to not log the traceback, e.g.
>> https://gerrit.ovirt.org/94197
>
> Yes, fixing the problem is the right solution, there shouldn't be
> tracebacks in the log without (good) reason.

I wasn't aware of the fact (?) that this is indeed the policy.

(

As a side note, in at least engine-setup, we do log tracebacks in
some cases and consider this ok. See also this, especially the last
few paragraphs:

https://ovirt.org/develop/developer-guide/engine/otopi/#logs

)

>  I think Petr Horáček is
> working on it.

Very well. Cced you and him on above patch. If his fix is expected soon,
I don't mind abandoning mine (above).

>
>> Probably all of these can be done as well.
>>
>> [1] 
>> https://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/artifact/exported-artifacts/test_logs/compat-4.0-suite-master/post-004_basic_sanity.py/lago-compat-4-0-suite-master-host-0/_var_log/vdsm/vdsm.log
>>
>>>
>>> --------------------- >> end captured logging << ---------------------
>>>
>>> Stack Trace:
>>>   File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
>>>     testMethod()
>>>   File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
>>>     self.test(*self.arg)
>>>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, 
>>> in wrapped_test
>>>     test()
>>>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in 
>>> wrapper
>>>     return func(get_test_prefix(), *args, **kwargs)
>>>   File 
>>> "/home/jenkins/workspace/ovirt-system-tests_compat-4.0-suite-master/ovirt-system-tests/compat-4.0-suite-master/test-scenarios/004_basic_sanity.py",
>>>  line 1140, in verify_suspend_resume_vm0
>>>     nt.assert_equals(log_line_count('START create\(.*' + identifier), 1)
>>>   File "/usr/lib64/python2.7/unittest/case.py", line 553, in assertEqual
>>>     assertion_func(first, second, msg=msg)
>>>   File "/usr/lib64/python2.7/unittest/case.py", line 546, in 
>>> _baseAssertEqual
>>>     raise self.failureException(msg)
>>> '0 != 1\n-------------------- >> begin captured logging << 
>>> --------------------\nlago.ssh: DEBUG: start 
>>> task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for 
>>> lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: end 
>>> task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for 
>>> lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: Running 414b4eae on 
>>> lago-compat-4-0-suite-master-host-0: awk \'BEGIN { n = 0; } $1 + " " + $2 > 
>>> "IOError: [Errno" && $0 ~ /START create\\(.*hiberVolHandle/ { n = n + 1; } 
>>> END { print n; }\' /var/log/vdsm/vdsm.log\nlago.ssh: DEBUG: Command 
>>> 414b4eae on lago-compat-4-0-suite-master-host-0 returned with 0\nlago.ssh: 
>>> DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 output:\n 
>>> 0\n\n--------------------- >> end captured logging << ---------------------'
>>> _______________________________________________
>>> Infra mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct: 
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives: 
>>> https://lists.ovirt.org/archives/list/[email protected]/message/PTTD52JZ7G72DXURL3RAPBGVMKKASGHH/
>>>



-- 
Didi
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/V2EPZY6CMU4TQNTJQB5PHGVNE4BSA53L/

Reply via email to