On Tue, Apr 24, 2018 at 10:27 PM, Ravi Shankar Nori <[email protected]> wrote: > > > On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori <[email protected]> > wrote: >> >> >> >> On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg <[email protected]> >> wrote: >>> >>> On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori <[email protected]> >>> wrote: >>> > >>> > >>> > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <[email protected]> >>> > wrote: >>> >> >>> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <[email protected]> >>> >> wrote: >>> >> > >>> >> > >>> >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <[email protected]> >>> >> > wrote: >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori >>> >> >> <[email protected]> >>> >> >> wrote: >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >>> >> >>> <[email protected]> >>> >> >>> wrote: >>> >> >>>> >>> >> >>>> Ravi's patch is in, but a similar problem remains, and the test >>> >> >>>> cannot >>> >> >>>> be put back into its place. >>> >> >>>> >>> >> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync >>> >> >>>> requests queued up. At one point, the host resumed its >>> >> >>>> connection, >>> >> >>>> before the requests have been cleared of the queue. After the >>> >> >>>> host is >>> >> >>>> up, the following tests resume, and at a pseudorandom point in >>> >> >>>> time, >>> >> >>>> an old getCapsAsync request times out and kills our connection. >>> >> >>>> >>> >> >>>> I believe that as long as ANY request is on flight, the >>> >> >>>> monitoring >>> >> >>>> lock should not be released, and the host should not be declared >>> >> >>>> as >>> >> >>>> up. >>> >> >>> >> Would you relate to this analysis ^^^ ? >>> >> >>> > >>> > The HostMonitoring lock issue has been fixed by >>> > https://gerrit.ovirt.org/#/c/90189/ >>> >>> Is there still a chance that a host moves to Up while former >>> getCapsAsync request are still in-flight? >>> >> >> Should not happen. Is there a way to execute/reproduce the failing test on >> Dev env? >> >>> >>> > >>> >> >>> >> >>>> >>> >> >>>> >>> >> >>> >>> >> >>> >>> >> >>> Hi Dan, >>> >> >>> >>> >> >>> Can I have the link to the job on jenkins so I can look at the >>> >> >>> logs >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >>> >> >> >>> >> > >>> >> > >>> >> > From the logs the only VDS lock that is being released twice is >>> >> > VDS_FENCE >>> >> > lock. Opened a BZ [1] for it. Will post a fix >>> >> > >>> >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 >>> >> >>> >> Can this possibly cause a surprise termination of host connection? >>> > >>> > >>> > Not sure, from the logs VDS_FENCE is the only other VDS lock that is >>> > being >>> > released >> >> > > Would be helpful if I can get the exact flow that is failing and also the > steps if any needed to reproduce the issue
By now the logs of http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ have been garbage-collected, so I cannot point you to the location in the logs. Maybe Alona has a local copy. According to her analysis the issue manifest itself when setupNetworks follows vdsm restart. Have you tried running OST with prepare_migration_attachments_ipv6 reintroduced? It should always pass. Regards, Dan. _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
