Master revert patches [1], [2] merged, 4.2 revert patches [3], [4] waiting to be merged.
We will repost patches to master tomorrow and will continue to investigate mysterious host-deploy issue. Btw, upgrade-from-prev-release on master [5] currently fails with: 18:59:31 + cp 'ovirt-system-tests/upgrade-from-prevrelease-suite-master/*.repo' exported-artifacts 18:59:31 cp: cannot stat 'ovirt-system-tests/upgrade-from-prevrelease-suite-master/*.repo': No such file or directory 18:59:31 POST BUILD TASK : FAILURE So how can we test upgrade from 4.2 to master? Martin [1] https://gerrit.ovirt.org/91741 [2] https://gerrit.ovirt.org/91742 [3] https://gerrit.ovirt.org/91744 [4] https://gerrit.ovirt.org/91745 [5] https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/2758/console On Tue, May 29, 2018 at 3:42 PM, Barak Korren <[email protected]> wrote: > > > On 29 May 2018 at 16:30, Martin Perina <[email protected]> wrote: > >> >> >> On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <[email protected]> wrote: >> >>> Martin, do you have any updates? please note that ovirt-engine has been >>> broken for a few days so perhaps we should stop merging or revert the >>> original change? >>> >> >> Still looking at it, here are partial results: >> >> 1. New host installation: never reproduced, 4.2 host is always installed >> fine on 4.2 engine >> 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2 >> was always successfull >> 3. Reinstallation - once it happened to me that during reinstallation the >> host remain stucked during Reinstallation and the whole reinstallation >> failed due to timeout >> - that may be the issue which can be seen in CI, but so far I don't >> have reliable reproducer to be able to debug why host-deploy process on the >> host is stucked >> > > Did you try using OST locally? it reproduces consistently with the OST > upgrade suit. You can also use the manual job and pass a URL to any engine > build beyond the marked patch. But there you'll have the same issue as with > the CQ job where you won't have logs... > > Note, the process that happens there is AFAIK: > 1. The oVirt 4.1 release is installed. > 2. engine-setup runs > 3. repos are changed to the master repo > 4. engine is upgraded > 5. bootstrap (including AddHost that fails is carried out) > > >> >> >>> >>> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <[email protected]> >>> wrote: >>> >>>> +Martin >>>> >>>> He is working on it. >>>> >>>> Thanks, >>>> Piotr >>>> >>>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <[email protected]> wrote: >>>> >>>>> Hi Piotr, >>>>> >>>>> Any update on this? >>>>> >>>>> Thanks. >>>>> Dafna >>>>> >>>>> >>>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski < >>>>> [email protected]> wrote: >>>>> >>>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> > >>>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski < >>>>>> [email protected]> >>>>>> > wrote: >>>>>> >> >>>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <[email protected]> >>>>>> wrote: >>>>>> >> > Note: we're now seeing a very similar issue in the 4.2 branch as >>>>>> well >>>>>> >> > that >>>>>> >> > seems to have been introduced by the following patch: >>>>>> >> >>>>>> >> Can you point to specific job so we could take a look at the logs? >>>>>> > >>>>>> > >>>>>> > Whoops, sorry, here: >>>>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/ >>>>>> > >>>>>> >>>>>> Looks like the same issue: >>>>>> >>>>>> 2018-05-28 03:41:03,606-04 ERROR >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog] >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error >>>>>> running >>>>>> command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask >>>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t >>>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null >>>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar >>>>>> --warning=no-timestamp -C "${MYTMP}" -x && >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH >>>>>> session timeout host >>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0' >>>>>> 2018-05-28 03:41:03,606-04 ERROR >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) >>>>>> [1244c90f] Error during deploy dialog >>>>>> 2018-05-28 03:41:03,611-04 ERROR >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during >>>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH >>>>>> session timeout host >>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0' >>>>>> >>>>>> >> >>>>>> >> >>>>>> >> > >>>>>> >> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong >>>>>> ciphers >>>>>> >> > for >>>>>> >> > 4.2 hosts >>>>>> >> > >>>>>> >> > On 28 May 2018 at 10:26, Barak Korren <[email protected]> >>>>>> wrote: >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> On 28 May 2018 at 10:19, Martin Perina <[email protected]> >>>>>> wrote: >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski >>>>>> >> >>> <[email protected]> >>>>>> >> >>> wrote: >>>>>> >> >>>> >>>>>> >> >>>> Simone, >>>>>> >> >>>> >>>>>> >> >>>> What do you think about this failure? >>>>>> >> >>>> >>>>>> >> >>>> Thanks, >>>>>> >> >>>> Piotr >>>>>> >> >>>> >>>>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren < >>>>>> [email protected]> >>>>>> >> >>>> wrote: >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski < >>>>>> [email protected]> >>>>>> >> >>>>> wrote: >>>>>> >> >>>>>> >>>>>> >> >>>>>> Martin, >>>>>> >> >>>>>> >>>>>> >> >>>>>> I only can see: >>>>>> >> >>>>>> >>>>>> >> >>>>>> 2018-05-25 13:57:44,255-04 ERROR >>>>>> >> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog] >>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH >>>>>> error >>>>>> >> >>>>>> running >>>>>> >> >>>>>> command root@lago-upgrade-from-release >>>>>> -suite-master-host-0:'umask >>>>>> >> >>>>>> 0077; >>>>>> >> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t >>>>>> ovirt-XXXXXXXXXX)"; >>>>>> >> >>>>>> trap >>>>>> >> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr >>>>>> \"${MYTMP}\" >>>>>> >> >>>>>> > >>>>>> >> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" >>>>>> -x && >>>>>> >> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>>>>> >> >>>>>> DIALOG/customization=bool:True': >>>>>> TimeLimitExceededException: SSH >>>>>> >> >>>>>> session >>>>>> >> >>>>>> timeout host 'root@lago-upgrade-from-releas >>>>>> e-suite-master-host-0' >>>>>> >> >>>>>> 2018-05-25 13:57:44,259-04 ERROR >>>>>> >> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] >>>>>> Timeout during >>>>>> >> >>>>>> host >>>>>> >> >>>>>> lago-upgrade-from-release-suite-master-host-0 install: SSH >>>>>> session >>>>>> >> >>>>>> timeout >>>>>> >> >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0' >>>>>> >> >>>>>> >>>>>> >> >>>>>> There are no additional logs. SSH to host timeout. Are we >>>>>> sure that >>>>>> >> >>>>>> it >>>>>> >> >>>>>> is an issue caused by Ravi's change? >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> We have some quite strong circumstantial evidence: >>>>>> >> >>>>> - Issue had affected all engine patches since that patch in a >>>>>> >> >>>>> similar >>>>>> >> >>>>> fashion. >>>>>> >> >>>>> - Prior engine patch [1] passed successfully [2] >>>>>> >> >>>>> - Other subsequent OST runs without engine patches passed >>>>>> >> >>>>> successfully >>>>>> >> >>>>> as well [3]. >>>>>> >> >>>>> >>>>>> >> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2 >>>>>> >> >>>>> [2]: >>>>>> >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste >>>>>> r/7777/ >>>>>> >> >>>>> [3]: >>>>>> >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste >>>>>> r/7778/ >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> Please note - the issue is affecting a test that is run by an >>>>>> >> >>>>> upgrade >>>>>> >> >>>>> suit on the post-upgrade system. It has no affect on the >>>>>> basic suit. >>>>>> >> >>>>> So it >>>>>> >> >>>>> probably has to do with some behaviour that is specific to >>>>>> upgraded >>>>>> >> >>>>> systems. >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> I will try to reproduce later today in dev env, but I agree >>>>>> with >>>>>> >> >>> Piotr's >>>>>> >> >>> investigation, engine was not able to connect to the host >>>>>> using SSH >>>>>> >> >>> and >>>>>> >> >>> that's why no host-deploy logs were fetched. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> Lago fetches the logs from the host too (And it can take then >>>>>> from the >>>>>> >> >> VM >>>>>> >> >> image directly if the host is not responsive over SSH), can we >>>>>> get at >>>>>> >> >> the >>>>>> >> >> host-deploy logs that way? >>>>>> >> >> >>>>>> >> >> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>>> >>>>>> >> >>>>>> >>>>>> >> >>>>>> Thanks, >>>>>> >> >>>>>> Piotr >>>>>> >> >>>>>> >>>>>> >> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina >>>>>> >> >>>>>> <[email protected]> >>>>>> >> >>>>>> wrote: >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> Adding also Piotr to the thread >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> >>>>>> >> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, < >>>>>> [email protected]> >>>>>> >> >>>>>>> wrote: >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ] >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Link to suspected patches: >>>>>> >> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS >>>>>> versions < 1.2 >>>>>> >> >>>>>>>> for hosts with cluster level>=4.1 >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Link to Job: >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o >>>>>> virt-master_change-queue-tester/7776/ >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Link to all logs: >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o >>>>>> virt-master_change-queue-tester/7776/artifact/exported-artif >>>>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade- >>>>>> from-release-suite-master/post-002_bootstrap.py/ >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Error snippet from log: >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> From nosetst log: >>>>>> >> >>>>>>>> <error> >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> AssertionError: False != True after 1200 seconds >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> </error> >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine >>>>>> for some >>>>>> >> >>>>>>>> reason. >>>>>> >> >>>>>>>> This seems to have cause consistent failure in all other >>>>>> engine >>>>>> >> >>>>>>>> patches that followed it. >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> >>>>>> >> >>>>>>>> -- >>>>>> >> >>>>>>>> Barak Korren >>>>>> >> >>>>>>>> RHV DevOps team , RHCE, RHCi >>>>>> >> >>>>>>>> Red Hat EMEA >>>>>> >> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>> >> >>>>>> >>>>>> >> >>>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> -- >>>>>> >> >>>>> Barak Korren >>>>>> >> >>>>> RHV DevOps team , RHCE, RHCi >>>>>> >> >>>>> Red Hat EMEA >>>>>> >> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> -- >>>>>> >> >>> Martin Perina >>>>>> >> >>> Associate Manager, Software Engineering >>>>>> >> >>> Red Hat Czech s.r.o. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> -- >>>>>> >> >> Barak Korren >>>>>> >> >> RHV DevOps team , RHCE, RHCi >>>>>> >> >> Red Hat EMEA >>>>>> >> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > -- >>>>>> >> > Barak Korren >>>>>> >> > RHV DevOps team , RHCE, RHCi >>>>>> >> > Red Hat EMEA >>>>>> >> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>> >> > >>>>>> >> > _______________________________________________ >>>>>> >> > Devel mailing list -- [email protected] >>>>>> >> > To unsubscribe send an email to [email protected] >>>>>> >> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> >> > oVirt Code of Conduct: >>>>>> >> > https://www.ovirt.org/community/about/community-guidelines/ >>>>>> >> > List Archives: >>>>>> >> > >>>>>> >> > https://lists.ovirt.org/archives/list/[email protected]/messag >>>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/ >>>>>> >> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Barak Korren >>>>>> > RHV DevOps team , RHCE, RHCi >>>>>> > Red Hat EMEA >>>>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>> _______________________________________________ >>>>>> Devel mailing list -- [email protected] >>>>>> To unsubscribe send an email to [email protected] >>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit >>>>>> y/about/community-guidelines/ >>>>>> List Archives: https://lists.ovirt.org/archiv >>>>>> es/list/[email protected]/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/ >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> Martin Perina >> Associate Manager, Software Engineering >> Red Hat Czech s.r.o. >> > > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o.
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/5GBPC2T2MA5DEQKPBPVDPV6QZT22UKAH/
