top posting is evil. On Fri, Dec 29, 2017 at 1:00 PM, Marcin Mirecki <[email protected]> wrote: > > On Thu, Dec 28, 2017 at 11:48 PM, Yaniv Kaul <[email protected]> wrote: >> >> >> >> On Fri, Dec 29, 2017 at 12:26 AM, Barak Korren <[email protected]> wrote: >>> >>> On 29 December 2017 at 00:22, Barak Korren <[email protected]> wrote: >>> > On 28 December 2017 at 20:02, Dan Kenigsberg <[email protected]> wrote: >>> >> Yet >>> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4559/ >>> >> (which is the gating job for https://gerrit.ovirt.org/#/c/85797/2 ) >>> >> still fails. >>> >> Could you look into why, Marcin? >>> >> The failure seems unrelated to ovn, as it is about a *host* loosing >>> >> connectivity. But it reproduces too much, so we need to get to the >>> >> bottom of it. >>> >> >>> > >>> > Re sending the change through the gate yielded a different error: >>> > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4563/ >>> > >>> > If this is still unrelated, we need to think seriously what is raising >>> > this large amount of unrelated failures. We cannot do any accurate >>> > reporting when failures are sporadic. >>> > >>> >>> And here is yet another host connectivity issue failing a test for a >>> change that should have no effect whatsoever (its a tox patch for >>> vdsm): >>> >>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4565/ >> >> >> I've added a fair number of changes this week. I doubt they are related, >> but the one that stands out >> is the addition of a fence-agent to one of the hosts. >> https://gerrit.ovirt.org/#/c/85817/ disables this specific test, just in >> case. >> >> I don't think it causes an issue, but it's the only one looking at the git >> log I can suspect.
> Trying to rebuild Barak's build resulted in another fail: > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4571/ > (with the same problem as Dan's build) > > Engine log contains a few of "IOException: Broken pipe" > which seem to correspond to a vdsm restart: "[vds] Exiting (vdsmd:170)" > yet looking at my local successful run, I see the same issues in the log. > I don't see any other obvious reasons for the problem so far. This actually points back to ykaul's fencing patch. And indeed, http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4571/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log has 2017-12-29 05:26:07,712-05 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHClient] (EE-ManagedThreadFactory-engine-Thread-417) [1a4f9963] Executed: '/usr/bin/vdsm-tool service-restart vdsmd' which means that Engine decided that it wants to kill vdsm. There are multiple communication errors prior to the soft fencing, but maybe waiting a bit longer would have kept the host alive. _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
