On Mon, Nov 25, 2019 at 5:16 PM Nir Soffer <[email protected]> wrote:
> On Mon, Nov 25, 2019 at 6:05 PM Dominik Holler <[email protected]> wrote: > > > > > > > > On Mon, Nov 25, 2019 at 4:50 PM Nir Soffer <[email protected]> wrote: > >> > >> On Mon, Nov 25, 2019 at 11:00 AM Dominik Holler <[email protected]> > wrote: > >> > > >> > > >> > > >> > On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <[email protected]> > wrote: > >> >> > >> >> > >> >> > >> >> On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler <[email protected]> > wrote: > >> >>> > >> >>> > >> >>> > >> >>> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer <[email protected]> > wrote: > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Fri, Nov 22, 2019, 18:18 Marcin Sobczyk <[email protected]> > wrote: > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> On 11/22/19 4:54 PM, Martin Perina wrote: > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> On Fri, Nov 22, 2019 at 4:43 PM Dominik Holler < > [email protected]> wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> On Fri, Nov 22, 2019 at 12:17 PM Dominik Holler < > [email protected]> wrote: > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Fri, Nov 22, 2019 at 12:00 PM Miguel Duarte de Mora Barroso < > [email protected]> wrote: > >> >>>>>>>> > >> >>>>>>>> On Fri, Nov 22, 2019 at 11:54 AM Vojtech Juranek < > [email protected]> wrote: > >> >>>>>>>> > > >> >>>>>>>> > On pátek 22. listopadu 2019 9:56:56 CET Miguel Duarte de > Mora Barroso wrote: > >> >>>>>>>> > > On Fri, Nov 22, 2019 at 9:49 AM Vojtech Juranek < > [email protected]> > >> >>>>>>>> > > wrote: > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > On pátek 22. listopadu 2019 9:41:26 CET Dominik Holler > wrote: > >> >>>>>>>> > > > > >> >>>>>>>> > > > > On Fri, Nov 22, 2019 at 8:40 AM Dominik Holler < > [email protected]> > >> >>>>>>>> > > > > wrote: > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > On Thu, Nov 21, 2019 at 10:54 PM Nir Soffer < > [email protected]> > >> >>>>>>>> > > > > > wrote: > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > >> On Thu, Nov 21, 2019 at 11:24 PM Vojtech Juranek > >> >>>>>>>> > > > > >> <[email protected]> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> wrote: > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > Hi, > >> >>>>>>>> > > > > >> > OST fails (see e.g. [1]) in > 002_bootstrap.check_update_host. It > >> >>>>>>>> > > > > >> > fails > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> with > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > FAILED! => {"changed": false, "failures": [], > "msg": "Depsolve > >> >>>>>>>> > > > > >> > Error > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> occured: > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > \n Problem 1: cannot install the best update > candidate for package > >> >>>>>>>> > > > > >> > vdsm- > >> >>>>>>>> > > > > >> > network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n - > nothing provides > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> nmstate > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > needed by > vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n > >> >>>>>>>> > > > > >> > Problem 2: > >> >>>>>>>> > > > > >> > package > vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch requires > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> vdsm-network > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > = 4.40.0-1271.git524e08c8a.el8, but none of the > providers can be > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> installed\n > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > - cannot install the best update candidate for > package vdsm- > >> >>>>>>>> > > > > >> > python-4.40.0-1236.git63ea8cb8b.el8.noarch\n - > nothing provides > >> >>>>>>>> > > > > >> > nmstate > >> >>>>>>>> > > > > >> > needed by > vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> nmstate should be provided by copr repo enabled by > >> >>>>>>>> > > > > >> ovirt-release-master. > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > I re-triggered as > >> >>>>>>>> > > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131 > >> >>>>>>>> > > > > > maybe > >> >>>>>>>> > > > > > https://gerrit.ovirt.org/#/c/104825/ > >> >>>>>>>> > > > > > was missing > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > Looks like > >> >>>>>>>> > > > > https://gerrit.ovirt.org/#/c/104825/ is ignored by > OST. > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > maybe not. You re-triggered with [1], which really > missed this patch. > >> >>>>>>>> > > > I did a rebase and now running with this patch in build > #6132 [2]. Let's > >> >>>>>>>> > > > wait > >> >>>>>>>> > for it to see if gerrit #104825 helps. > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > [1] > https://jenkins.ovirt.org/job/standard-manual-runner/909/ > >> >>>>>>>> > > > [2] > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/ > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > Miguel, do you think merging > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos > >> >>>>>>>> > > > > t-cq > >> >>>>>>>> > .repo.in > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > would solve this? > >> >>>>>>>> > > > >> >>>>>>>> > > > >> >>>>>>>> > > I've split the patch Dominik mentions above in two, one of > them adding > >> >>>>>>>> > > the nmstate / networkmanager copr repos - [3]. > >> >>>>>>>> > > > >> >>>>>>>> > > Let's see if it fixes it. > >> >>>>>>>> > > >> >>>>>>>> > it fixes original issue, but OST still fails in > >> >>>>>>>> > 098_ovirt_provider_ovn.use_ovn_provider: > >> >>>>>>>> > > >> >>>>>>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134 > >> >>>>>>>> > >> >>>>>>>> I think Dominik was looking into this issue; +Dominik Holler > please confirm. > >> >>>>>>>> > >> >>>>>>>> Let me know if you need any help Dominik. > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Thanks. > >> >>>>>>> The problem is that the hosts lost connection to storage: > >> >>>>>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log > : > >> >>>>>>> > >> >>>>>>> 2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5) > [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n > /sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"] > ignore_suspended_devices=1 write_cache_state=0 > disable_after_error_count=3 > filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|", > "r|.*|"] } global { locking_type=1 prioritise_write_locks=1 > wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' > --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o > uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name > (cwd None) (commands:153) > >> >>>>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) > [storage.Monitor] Error checking path > /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata > (monitor:501) > >> >>>>>>> Traceback (most recent call last): > >> >>>>>>> File > "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in > _pathChecked > >> >>>>>>> delay = result.delay() > >> >>>>>>> File > "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay > >> >>>>>>> raise exception.MiscFileReadException(self.path, self.rc, > self.err) > >> >>>>>>> vdsm.storage.exception.MiscFileReadException: Internal file > read failure: > ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata', > 1, 'Read timeout') > >> >>>>>>> 2019-11-22 05:39:12,416-0500 INFO (check/loop) > [storage.Monitor] Domain d10879c6-8de1-40ba-87fa-f447844eed2a became > INVALID (monitor:472) > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> I failed to reproduce local to analyze this, I will try again, > any hints welcome. > >> >>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> https://gerrit.ovirt.org/#/c/104925/1/ shows that > 008_basic_ui_sanity.py triggers the problem. > >> >>>>>> Is there someone with knowledge about the basic_ui_sanity around? > >> >>>>> > >> >>>>> How do you think it's related? By commenting out the ui sanity > tests and seeing OST with successful finish? > >> >>>>> > >> >>>>> Looking at 6134 run you were discussing: > >> >>>>> > >> >>>>> - timing of the ui sanity set-up [1]: > >> >>>>> > >> >>>>> 11:40:20 @ Run test: 008_basic_ui_sanity.py: > >> >>>>> > >> >>>>> - timing of first encountered storage error [2]: > >> >>>>> > >> >>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] > Error checking path > /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata > (monitor:501) > >> >>>>> Traceback (most recent call last): > >> >>>>> File > "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in > _pathChecked > >> >>>>> delay = result.delay() > >> >>>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", > line 391, in delay > >> >>>>> raise exception.MiscFileReadException(self.path, self.rc, > self.err) > >> >>>>> vdsm.storage.exception.MiscFileReadException: Internal file read > failure: > ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata', > 1, 'Read timeout') > >> >>>>> > >> >>>>> Timezone difference aside, it seems to me that these storage > errors occured before doing anything ui-related. > >> >> > >> >> > >> >> > >> >> You are right, a time.sleep(8*60) in > >> >> https://gerrit.ovirt.org/#/c/104925/2 > >> >> has triggers the issue the same way. > >> > >> So this is a test issues, assuming that the UI tests can complete in > >> less than 8 minutes? > >> > > > > To my eyes this looks like storage is just stop working after some time. > > > >> > >> >> > >> > > >> > Nir or Steve, can you please confirm that this is a storage problem? > >> > >> Why do you think we have a storage problem? > >> > > > > I understand from the posted log snippets that they say that the storage > is not accessible anymore, > > No, so far one read timeout was reported, this does not mean storage > is not available anymore. > It can be temporary issue that does not harm anything. > > > while the host is still responsive. > > This might be triggered by something outside storage, e.g. the network > providing the storage stopped working, > > But I think a possible next step in analysing this issue would be to > find the reason why storage is not happy. > > Sounds like there was a miscommunication in this thread. I try to address all of your points, please let me know if something is missing or not clearly expressed. > First step is to understand which test fails, 098_ovirt_provider_ovn.use_ovn_provider > and why. This can be done by the owner of the test, The test was added by the network team. > understanding what the test does The test tries to add a vNIC. > and what is the expected system behavior. > > It is expected that adding a vNIC works, because the VM should be up. > If the owner of the test thinks that the test failed because of a storage > issue > I am not sure who is the owner, but I do. > someone from storage can look at this. > > Thanks, I would appreciate this. > But the fact that adding long sleep reproduce the issue means it is not > related > in any way to storage. > Nir > > > > >> > >> > > >> >> > >> >> > >> >>>>> > >> >>>>> I remember talking with Steven Rosenberg on IRC a couple of days > ago about some storage metadata issues and he said he got a response from > Nir, that "it's a known issue". > >> >>>>> > >> >>>>> Nir, Amit, can you comment on this? > >> >>>> > >> >>>> > >> >>>> The error mentioned here is not vdsm error but warning about > storage accessibility. We sould convert the tracebacks to warning. > >> >>>> > >> >>>> The reason for such issue can be misconfigured network (maybe > network team is testing negative flows?), > >> >>> > >> >>> > >> >>> No. > >> >>> > >> >>>> > >> >>>> or some issue in the NFS server. > >> >>>> > >> >>> > >> >>> Only hint I found is > >> >>> "Exiting Time2Retain handler because session_reinstatement=1" > >> >>> but I have no idea what this means or if this is relevant at all. > >> >>> > >> >>>> > >> >>>> One read timeout is not an issue. We have a real issue only if we > have consistent read timeouts or errors for couple of minutes, after that > engine can deactivate the storage domain or some hosts if only these hosts > are having trouble to access storage. > >> >>>> > >> >>>> In OST we never expect such conditions since we don't test > negative flows, and we should have good connectivity with the vms running > on the same host. > >> >>>> > >> >>> > >> >>> Ack, this seems to be the problem. > >> >>> > >> >>>> > >> >>>> Nir > >> >>>> > >> >>>> > >> >>>>> [1] > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console > >> >>>>> [2] > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log > >> >>>>>> > >> >>>>>> > >> >>>>> > >> >>>>> Marcin, could you please take a look? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>>>> > >> >>>>>>>> > > >> >>>>>>>> > > [3] - https://gerrit.ovirt.org/#/c/104897/ > >> >>>>>>>> > > > >> >>>>>>>> > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > > >> >>>>>>>> > > > > >> Who installs this rpm in OST? > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > I do not understand the question. > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > > > >> >>>>>>>> > > > > >> > [...] > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > See [2] for full error. > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > Can someone please take a look? > >> >>>>>>>> > > > > >> > Thanks > >> >>>>>>>> > > > > >> > Vojta > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > > >> >>>>>>>> > > > > >> > [1] > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/ > >> >>>>>>>> > > > > >> > [2] > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact > >> >>>>>>>> > > > > >> / > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > exported-artifacts/test_logs/basic-suite-master/ > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> post-002_bootstrap.py/lago- > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________ > >> >>>>>>>> > > > > >> ____ > >> >>>>>>>> > > > > >> ________________________________>> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > Devel mailing list -- [email protected] > >> >>>>>>>> > > > > >> > To unsubscribe send an email to > [email protected] > >> >>>>>>>> > > > > >> > Privacy Statement: > https://www.ovirt.org/site/privacy-policy/ > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > oVirt Code of Conduct: > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > https://www.ovirt.org/community/about/community-guidelines/ > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > List Archives: > >> >>>>>>>> > > > > >> > >> >>>>>>>> > > > > >> > https://lists.ovirt.org/archives/list/[email protected]/message/4K5N3VQ > >> >>>>>>>> > > > > >> N26B > >> >>>>>>>> > > > > >> L73K7D45A2IR7R3UMMM23/ > >> >>>>>>>> > > > > >> _______________________________________________ > >> >>>>>>>> > > > > >> Devel mailing list -- [email protected] > >> >>>>>>>> > > > > >> To unsubscribe send an email to > [email protected] > >> >>>>>>>> > > > > >> Privacy Statement: > https://www.ovirt.org/site/privacy-policy/ > >> >>>>>>>> > > > > >> oVirt Code of Conduct: > >> >>>>>>>> > > > > >> > https://www.ovirt.org/community/about/community-guidelines/ > >> >>>>>>>> > > > > >> List Archives: > >> >>>>>>>> > > > > >> > https://lists.ovirt.org/archives/list/[email protected]/message/JN7MNUZ > >> >>>>>>>> > > > > >> N5K3 > >> >>>>>>>> > > > > >> NS5TGXFCILYES77KI5TZU/ > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > >> >>>>>>>> > > _______________________________________________ > >> >>>>>>>> > > Devel mailing list -- [email protected] > >> >>>>>>>> > > To unsubscribe send an email to [email protected] > >> >>>>>>>> > > Privacy Statement: > https://www.ovirt.org/site/privacy-policy/ > >> >>>>>>>> > > oVirt Code of Conduct: > >> >>>>>>>> > > > https://www.ovirt.org/community/about/community-guidelines/ List Archives: > >> >>>>>>>> > > > https://lists.ovirt.org/archives/list/[email protected]/message/UPJ5SEAV5Z65H > >> >>>>>>>> > > 5BQ3SCHOYZX6JMTQPBW/ > >> >>>>>>>> > > >> >>>>>>>> > >> >>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> Martin Perina > >> >>>>> Manager, Software Engineering > >> >>>>> Red Hat Czech s.r.o. > >> >>>>> > >> >>>>> > >> > >
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/WEOM5ONIOVPHGHWD7XI3RCT443UWZJFN/
