On Fri, Dec 29, 2017 at 2:21 PM, Dan Kenigsberg <[email protected]> wrote:
> top posting is evil. > > On Fri, Dec 29, 2017 at 1:00 PM, Marcin Mirecki <[email protected]> > wrote: > > > > On Thu, Dec 28, 2017 at 11:48 PM, Yaniv Kaul <[email protected]> wrote: > >> > >> > >> > >> On Fri, Dec 29, 2017 at 12:26 AM, Barak Korren <[email protected]> > wrote: > >>> > >>> On 29 December 2017 at 00:22, Barak Korren <[email protected]> wrote: > >>> > On 28 December 2017 at 20:02, Dan Kenigsberg <[email protected]> > wrote: > >>> >> Yet > >>> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4559/ > >>> >> (which is the gating job for https://gerrit.ovirt.org/#/c/85797/2 ) > >>> >> still fails. > >>> >> Could you look into why, Marcin? > >>> >> The failure seems unrelated to ovn, as it is about a *host* loosing > >>> >> connectivity. But it reproduces too much, so we need to get to the > >>> >> bottom of it. > >>> >> > >>> > > >>> > Re sending the change through the gate yielded a different error: > >>> > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4563/ > >>> > > >>> > If this is still unrelated, we need to think seriously what is > raising > >>> > this large amount of unrelated failures. We cannot do any accurate > >>> > reporting when failures are sporadic. > >>> > > >>> > >>> And here is yet another host connectivity issue failing a test for a > >>> change that should have no effect whatsoever (its a tox patch for > >>> vdsm): > >>> > >>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4565/ > >> > >> > >> I've added a fair number of changes this week. I doubt they are related, > >> but the one that stands out > >> is the addition of a fence-agent to one of the hosts. > >> https://gerrit.ovirt.org/#/c/85817/ disables this specific test, just > in > >> case. > >> > >> I don't think it causes an issue, but it's the only one looking at the > git > >> log I can suspect. > > > Trying to rebuild Barak's build resulted in another fail: > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4571/ > > (with the same problem as Dan's build) > > > > Engine log contains a few of "IOException: Broken pipe" > > which seem to correspond to a vdsm restart: "[vds] Exiting (vdsmd:170)" > > yet looking at my local successful run, I see the same issues in the log. > > I don't see any other obvious reasons for the problem so far. > > > This actually points back to ykaul's fencing patch. And indeed, > http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4571/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-005_network_by_ > label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log > has > > 2017-12-29 05:26:07,712-05 DEBUG > [org.ovirt.engine.core.uutils.ssh.SSHClient] > (EE-ManagedThreadFactory-engine-Thread-417) [1a4f9963] Executed: > '/usr/bin/vdsm-tool service-restart vdsmd' > > which means that Engine decided that it wants to kill vdsm. There are > multiple communication errors prior to the soft fencing, but maybe > waiting a bit longer would have kept the host alive. > Note that there's a test called vdsm recovery, where we actually stop and start VDSM - perhaps it's there? Anyway, disabled the test that adds fencing. I don't think this is the cause, but let's see. Y.
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
