On 30 May 2018 at 10:24, Martin Perina <[email protected]> wrote: > > > On Wed, May 30, 2018 at 8:13 AM, Barak Korren <[email protected]> wrote: > >> >> >> On 29 May 2018 at 22:29, Martin Perina <[email protected]> wrote: >> >>> Master revert patches [1], [2] merged, 4.2 revert patches [3], [4] >>> waiting to be merged. >>> >>> We will repost patches to master tomorrow and will continue to >>> investigate mysterious host-deploy issue. >>> >>> Btw, upgrade-from-prev-release on master [5] currently fails with: >>> >>> 18:59:31 + cp 'ovirt-system-tests/upgrade-fr >>> om-prevrelease-suite-master/*.repo' exported-artifacts >>> 18:59:31 cp: cannot stat 'ovirt-system-tests/upgrade-fr >>> om-prevrelease-suite-master/*.repo': No such file or directory >>> 18:59:31 POST BUILD TASK : FAILURE >>> >>> So how can we test upgrade from 4.2 to master? >>> >> >> This is not the real issue, the real issue is >> >> *00:00:19.190* /tmp/jenkins6944523151752956846.sh: line 4: >> ovirt-system-tests/upgrade-from-prevrelease-suite-master/extra_sources: No >> such file or directory >> >> >> >> This is happening because there is no >> 'upgrade-from-prevrelease-suite-master', >> the suite to be used is 'upgrade-from-release-suite-master'. >> > > Yes, but looking at [6] we are testing upgrade from 4.1 to master, is > that true? If so, how this can work? We are supporting upgrade only between > directly following versions, so it should not be possible to upgrade from > 4.1 to master directly ... >
Well, I wonder where is the patch to change that, should have been created when 4.2 went GA... > > So is this table in [7] valid? > > *Target oVirt version which will be tested.* > ENGINE_VERSION prev release release > master 4.2 master > --- 4.1 4.2 > 4.1 --- 4.1 > It looks messed up.... I uess we'll need to 'git blame'... > > >> >>> >>> Martin >>> >>> >>> [1] https://gerrit.ovirt.org/91741 >>> [2] https://gerrit.ovirt.org/91742 >>> [3] https://gerrit.ovirt.org/91744 >>> [4] https://gerrit.ovirt.org/91745 >>> [5] https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ov >>> irt-system-tests_manual/2758/console >>> >> > [6] https://github.com/oVirt/ovirt-system-tests/blob/ > master/upgrade-from-release-suite-master/pre-reposync-config.repo > [7] https://jenkins.ovirt.org/view/oVirt%20system%20tests/ > job/ovirt-system-tests_manual/build?delay=0sec > > > >> >>> >>> On Tue, May 29, 2018 at 3:42 PM, Barak Korren <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On 29 May 2018 at 16:30, Martin Perina <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <[email protected]> wrote: >>>>> >>>>>> Martin, do you have any updates? please note that ovirt-engine has >>>>>> been broken for a few days so perhaps we should stop merging or revert >>>>>> the >>>>>> original change? >>>>>> >>>>> >>>>> Still looking at it, here are partial results: >>>>> >>>>> 1. New host installation: never reproduced, 4.2 host is always >>>>> installed fine on 4.2 engine >>>>> 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to >>>>> 4.2 was always successfull >>>>> 3. Reinstallation - once it happened to me that during reinstallation >>>>> the host remain stucked during Reinstallation and the whole >>>>> reinstallation >>>>> failed due to timeout >>>>> - that may be the issue which can be seen in CI, but so far I >>>>> don't have reliable reproducer to be able to debug why host-deploy process >>>>> on the host is stucked >>>>> >>>> >>>> Did you try using OST locally? it reproduces consistently with the OST >>>> upgrade suit. You can also use the manual job and pass a URL to any engine >>>> build beyond the marked patch. But there you'll have the same issue as with >>>> the CQ job where you won't have logs... >>>> >>>> Note, the process that happens there is AFAIK: >>>> 1. The oVirt 4.1 release is installed. >>>> 2. engine-setup runs >>>> 3. repos are changed to the master repo >>>> 4. engine is upgraded >>>> 5. bootstrap (including AddHost that fails is carried out) >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> +Martin >>>>>>> >>>>>>> He is working on it. >>>>>>> >>>>>>> Thanks, >>>>>>> Piotr >>>>>>> >>>>>>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <[email protected]> wrote: >>>>>>> >>>>>>>> Hi Piotr, >>>>>>>> >>>>>>>> Any update on this? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> Dafna >>>>>>>> >>>>>>>> >>>>>>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren <[email protected]> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski < >>>>>>>>> [email protected]> >>>>>>>>> > wrote: >>>>>>>>> >> >>>>>>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >> > Note: we're now seeing a very similar issue in the 4.2 branch >>>>>>>>> as well >>>>>>>>> >> > that >>>>>>>>> >> > seems to have been introduced by the following patch: >>>>>>>>> >> >>>>>>>>> >> Can you point to specific job so we could take a look at the >>>>>>>>> logs? >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Whoops, sorry, here: >>>>>>>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/ >>>>>>>>> > >>>>>>>>> >>>>>>>>> Looks like the same issue: >>>>>>>>> >>>>>>>>> 2018-05-28 03:41:03,606-04 ERROR >>>>>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog] >>>>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error >>>>>>>>> running >>>>>>>>> command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask >>>>>>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t >>>>>>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null >>>>>>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar >>>>>>>>> --warning=no-timestamp -C "${MYTMP}" -x && >>>>>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>>>>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH >>>>>>>>> session timeout host >>>>>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0' >>>>>>>>> 2018-05-28 03:41:03,606-04 ERROR >>>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) >>>>>>>>> [1244c90f] Error during deploy dialog >>>>>>>>> 2018-05-28 03:41:03,611-04 ERROR >>>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>>>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout >>>>>>>>> during >>>>>>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH >>>>>>>>> session timeout host >>>>>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0' >>>>>>>>> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> > >>>>>>>>> >> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only >>>>>>>>> strong ciphers >>>>>>>>> >> > for >>>>>>>>> >> > 4.2 hosts >>>>>>>>> >> > >>>>>>>>> >> > On 28 May 2018 at 10:26, Barak Korren <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> On 28 May 2018 at 10:19, Martin Perina <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski >>>>>>>>> >> >>> <[email protected]> >>>>>>>>> >> >>> wrote: >>>>>>>>> >> >>>> >>>>>>>>> >> >>>> Simone, >>>>>>>>> >> >>>> >>>>>>>>> >> >>>> What do you think about this failure? >>>>>>>>> >> >>>> >>>>>>>>> >> >>>> Thanks, >>>>>>>>> >> >>>> Piotr >>>>>>>>> >> >>>> >>>>>>>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren < >>>>>>>>> [email protected]> >>>>>>>>> >> >>>> wrote: >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski < >>>>>>>>> [email protected]> >>>>>>>>> >> >>>>> wrote: >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> Martin, >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> I only can see: >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> 2018-05-25 13:57:44,255-04 ERROR >>>>>>>>> >> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog] >>>>>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] >>>>>>>>> SSH error >>>>>>>>> >> >>>>>> running >>>>>>>>> >> >>>>>> command root@lago-upgrade-from-release >>>>>>>>> -suite-master-host-0:'umask >>>>>>>>> >> >>>>>> 0077; >>>>>>>>> >> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t >>>>>>>>> ovirt-XXXXXXXXXX)"; >>>>>>>>> >> >>>>>> trap >>>>>>>>> >> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr >>>>>>>>> \"${MYTMP}\" >>>>>>>>> >> >>>>>> > >>>>>>>>> >> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C >>>>>>>>> "${MYTMP}" -x && >>>>>>>>> >> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>>>>>>>> >> >>>>>> DIALOG/customization=bool:True': >>>>>>>>> TimeLimitExceededException: SSH >>>>>>>>> >> >>>>>> session >>>>>>>>> >> >>>>>> timeout host 'root@lago-upgrade-from-releas >>>>>>>>> e-suite-master-host-0' >>>>>>>>> >> >>>>>> 2018-05-25 13:57:44,259-04 ERROR >>>>>>>>> >> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>>>>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] >>>>>>>>> Timeout during >>>>>>>>> >> >>>>>> host >>>>>>>>> >> >>>>>> lago-upgrade-from-release-suite-master-host-0 install: >>>>>>>>> SSH session >>>>>>>>> >> >>>>>> timeout >>>>>>>>> >> >>>>>> host 'root@lago-upgrade-from-releas >>>>>>>>> e-suite-master-host-0' >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> There are no additional logs. SSH to host timeout. Are >>>>>>>>> we sure that >>>>>>>>> >> >>>>>> it >>>>>>>>> >> >>>>>> is an issue caused by Ravi's change? >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> We have some quite strong circumstantial evidence: >>>>>>>>> >> >>>>> - Issue had affected all engine patches since that patch >>>>>>>>> in a >>>>>>>>> >> >>>>> similar >>>>>>>>> >> >>>>> fashion. >>>>>>>>> >> >>>>> - Prior engine patch [1] passed successfully [2] >>>>>>>>> >> >>>>> - Other subsequent OST runs without engine patches passed >>>>>>>>> >> >>>>> successfully >>>>>>>>> >> >>>>> as well [3]. >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2 >>>>>>>>> >> >>>>> [2]: >>>>>>>>> >> >>>>> http://jenkins.ovirt.org/job/o >>>>>>>>> virt-master_change-queue-tester/7777/ >>>>>>>>> >> >>>>> [3]: >>>>>>>>> >> >>>>> http://jenkins.ovirt.org/job/o >>>>>>>>> virt-master_change-queue-tester/7778/ >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> Please note - the issue is affecting a test that is run >>>>>>>>> by an >>>>>>>>> >> >>>>> upgrade >>>>>>>>> >> >>>>> suit on the post-upgrade system. It has no affect on the >>>>>>>>> basic suit. >>>>>>>>> >> >>>>> So it >>>>>>>>> >> >>>>> probably has to do with some behaviour that is specific >>>>>>>>> to upgraded >>>>>>>>> >> >>>>> systems. >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> I will try to reproduce later today in dev env, but I agree >>>>>>>>> with >>>>>>>>> >> >>> Piotr's >>>>>>>>> >> >>> investigation, engine was not able to connect to the host >>>>>>>>> using SSH >>>>>>>>> >> >>> and >>>>>>>>> >> >>> that's why no host-deploy logs were fetched. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> Lago fetches the logs from the host too (And it can take >>>>>>>>> then from the >>>>>>>>> >> >> VM >>>>>>>>> >> >> image directly if the host is not responsive over SSH), can >>>>>>>>> we get at >>>>>>>>> >> >> the >>>>>>>>> >> >> host-deploy logs that way? >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> Thanks, >>>>>>>>> >> >>>>>> Piotr >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina >>>>>>>>> >> >>>>>> <[email protected]> >>>>>>>>> >> >>>>>> wrote: >>>>>>>>> >> >>>>>>> >>>>>>>>> >> >>>>>>> Adding also Piotr to the thread >>>>>>>>> >> >>>>>>> >>>>>>>>> >> >>>>>>> >>>>>>>>> >> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, < >>>>>>>>> [email protected]> >>>>>>>>> >> >>>>>>> wrote: >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) >>>>>>>>> ] >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> Link to suspected patches: >>>>>>>>> >> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS >>>>>>>>> versions < 1.2 >>>>>>>>> >> >>>>>>>> for hosts with cluster level>=4.1 >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> Link to Job: >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o >>>>>>>>> virt-master_change-queue-tester/7776/ >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> Link to all logs: >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o >>>>>>>>> virt-master_change-queue-tester/7776/artifact/exported-artif >>>>>>>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade- >>>>>>>>> from-release-suite-master/post-002_bootstrap.py/ >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> Error snippet from log: >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> From nosetst log: >>>>>>>>> >> >>>>>>>> <error> >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> AssertionError: False != True after 1200 seconds >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> </error> >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine >>>>>>>>> for some >>>>>>>>> >> >>>>>>>> reason. >>>>>>>>> >> >>>>>>>> This seems to have cause consistent failure in all >>>>>>>>> other engine >>>>>>>>> >> >>>>>>>> patches that followed it. >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> >>>>>>>>> >> >>>>>>>> -- >>>>>>>>> >> >>>>>>>> Barak Korren >>>>>>>>> >> >>>>>>>> RHV DevOps team , RHCE, RHCi >>>>>>>>> >> >>>>>>>> Red Hat EMEA >>>>>>>>> >> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | >>>>>>>>> redhat.com/trusted >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> -- >>>>>>>>> >> >>>>> Barak Korren >>>>>>>>> >> >>>>> RHV DevOps team , RHCE, RHCi >>>>>>>>> >> >>>>> Red Hat EMEA >>>>>>>>> >> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>>>>> >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> -- >>>>>>>>> >> >>> Martin Perina >>>>>>>>> >> >>> Associate Manager, Software Engineering >>>>>>>>> >> >>> Red Hat Czech s.r.o. >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> -- >>>>>>>>> >> >> Barak Korren >>>>>>>>> >> >> RHV DevOps team , RHCE, RHCi >>>>>>>>> >> >> Red Hat EMEA >>>>>>>>> >> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > -- >>>>>>>>> >> > Barak Korren >>>>>>>>> >> > RHV DevOps team , RHCE, RHCi >>>>>>>>> >> > Red Hat EMEA >>>>>>>>> >> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>>>>> >> > >>>>>>>>> >> > _______________________________________________ >>>>>>>>> >> > Devel mailing list -- [email protected] >>>>>>>>> >> > To unsubscribe send an email to [email protected] >>>>>>>>> >> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>>>>> >> > oVirt Code of Conduct: >>>>>>>>> >> > https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>> >> > List Archives: >>>>>>>>> >> > >>>>>>>>> >> > https://lists.ovirt.org/archives/list/[email protected]/messag >>>>>>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/ >>>>>>>>> >> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > Barak Korren >>>>>>>>> > RHV DevOps team , RHCE, RHCi >>>>>>>>> > Red Hat EMEA >>>>>>>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>>>>> _______________________________________________ >>>>>>>>> Devel mailing list -- [email protected] >>>>>>>>> To unsubscribe send an email to [email protected] >>>>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit >>>>>>>>> y/about/community-guidelines/ >>>>>>>>> List Archives: https://lists.ovirt.org/archiv >>>>>>>>> es/list/[email protected]/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/ >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Martin Perina >>>>> Associate Manager, Software Engineering >>>>> Red Hat Czech s.r.o. >>>>> >>>> >>>> >>>> >>>> -- >>>> Barak Korren >>>> RHV DevOps team , RHCE, RHCi >>>> Red Hat EMEA >>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>> >>> >>> >>> >>> -- >>> Martin Perina >>> Associate Manager, Software Engineering >>> Red Hat Czech s.r.o. >>> >> >> >> >> -- >> Barak Korren >> RHV DevOps team , RHCE, RHCi >> Red Hat EMEA >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >> > > > > -- > Martin Perina > Associate Manager, Software Engineering > Red Hat Czech s.r.o. > -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/CXYQUUFYNKMLTVKKAAUGLXOAPUAVINBY/
