On 29 May 2018 at 22:29, Martin Perina <[email protected]> wrote:

> Master revert patches [1], [2] merged, 4.2 revert patches [3], [4] waiting
> to be merged.
>
> We will repost patches to master tomorrow and will continue to investigate
> mysterious host-deploy issue.
>
> Btw, upgrade-from-prev-release on master [5] currently fails with:
>
> 18:59:31 + cp 
> 'ovirt-system-tests/upgrade-from-prevrelease-suite-master/*.repo'
> exported-artifacts
> 18:59:31 cp: cannot stat 'ovirt-system-tests/upgrade-
> from-prevrelease-suite-master/*.repo': No such file or directory
> 18:59:31 POST BUILD TASK : FAILURE
>
> So how can we test upgrade from 4.2 to master?
>

This is not the real issue, the real issue is

*00:00:19.190* /tmp/jenkins6944523151752956846.sh: line 4:
ovirt-system-tests/upgrade-from-prevrelease-suite-master/extra_sources:
No such file or directory



This is happening because there is no
'upgrade-from-prevrelease-suite-master', the suite to be used is
'upgrade-from-release-suite-master'.


>
> Martin
>
>
> [1] https://gerrit.ovirt.org/91741
> [2] https://gerrit.ovirt.org/91742
> [3] https://gerrit.ovirt.org/91744
> [4] https://gerrit.ovirt.org/91745
> [5] https://jenkins.ovirt.org/view/oVirt%20system%20tests/
> job/ovirt-system-tests_manual/2758/console
>
>
> On Tue, May 29, 2018 at 3:42 PM, Barak Korren <[email protected]> wrote:
>
>>
>>
>> On 29 May 2018 at 16:30, Martin Perina <[email protected]> wrote:
>>
>>>
>>>
>>> On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <[email protected]> wrote:
>>>
>>>> Martin, do you have any updates? please note that ovirt-engine has been
>>>> broken for a few days so perhaps we should stop merging or revert the
>>>> original change?
>>>>
>>>
>>> ​Still looking at it, here are partial results:
>>>
>>> 1. New host installation: never reproduced, 4.2 host is always installed
>>> fine on 4.2 engine
>>> 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to
>>> 4.2 was always successfull
>>> 3. Reinstallation - once it happened to me that during reinstallation
>>> the host remain stucked during Reinstallation and the whole​ reinstallation
>>> failed due to timeout
>>>     - that may be the issue which can be seen in CI, but so far I don't
>>> have reliable reproducer to be able to debug why host-deploy process on the
>>> host is stucked
>>>
>>
>> Did you try using OST locally? it reproduces consistently with the OST
>> upgrade suit. You can also use the manual job and pass a URL to any engine
>> build beyond the marked patch. But there you'll have the same issue as with
>> the CQ job where you won't have logs...
>>
>> Note, the process that happens there is AFAIK:
>> 1. The oVirt 4.1 release is installed.
>> 2. engine-setup runs
>> 3. repos are changed to the master repo
>> 4. engine is upgraded
>> 5. bootstrap (including AddHost that fails is carried out)
>>
>>
>>>
>>>
>>>>
>>>> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <[email protected]>
>>>> wrote:
>>>>
>>>>> +Martin
>>>>>
>>>>> He is working on it.
>>>>>
>>>>> Thanks,
>>>>> Piotr
>>>>>
>>>>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <[email protected]> wrote:
>>>>>
>>>>>> Hi Piotr,
>>>>>>
>>>>>> Any update on this?
>>>>>>
>>>>>> Thanks.
>>>>>> Dafna
>>>>>>
>>>>>>
>>>>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski <
>>>>>>> [email protected]>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <
>>>>>>> [email protected]> wrote:
>>>>>>> >> > Note: we're now seeing a very similar issue in the 4.2 branch
>>>>>>> as well
>>>>>>> >> > that
>>>>>>> >> > seems to have been introduced by the following patch:
>>>>>>> >>
>>>>>>> >> Can you point to specific job so we could take a look at the logs?
>>>>>>> >
>>>>>>> >
>>>>>>> > Whoops, sorry, here:
>>>>>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>>>>>>> >
>>>>>>>
>>>>>>> Looks like the same issue:
>>>>>>>
>>>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error
>>>>>>> running
>>>>>>> command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
>>>>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null
>>>>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar
>>>>>>> --warning=no-timestamp -C "${MYTMP}" -x &&
>>>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH
>>>>>>> session timeout host
>>>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
>>>>>>> [1244c90f] Error during deploy dialog
>>>>>>> 2018-05-28 03:41:03,611-04 ERROR
>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during
>>>>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
>>>>>>> session timeout host
>>>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> >
>>>>>>> >> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong
>>>>>>> ciphers
>>>>>>> >> > for
>>>>>>> >> > 4.2 hosts
>>>>>>> >> >
>>>>>>> >> > On 28 May 2018 at 10:26, Barak Korren <[email protected]>
>>>>>>> wrote:
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> On 28 May 2018 at 10:19, Martin Perina <[email protected]>
>>>>>>> wrote:
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski
>>>>>>> >> >>> <[email protected]>
>>>>>>> >> >>> wrote:
>>>>>>> >> >>>>
>>>>>>> >> >>>> Simone,
>>>>>>> >> >>>>
>>>>>>> >> >>>> What do you think about this failure?
>>>>>>> >> >>>>
>>>>>>> >> >>>> Thanks,
>>>>>>> >> >>>> Piotr
>>>>>>> >> >>>>
>>>>>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren <
>>>>>>> [email protected]>
>>>>>>> >> >>>> wrote:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski <
>>>>>>> [email protected]>
>>>>>>> >> >>>>> wrote:
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Martin,
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> I only can see:
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> 2018-05-25 13:57:44,255-04 ERROR
>>>>>>> >> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH
>>>>>>> error
>>>>>>> >> >>>>>> running
>>>>>>> >> >>>>>> command root@lago-upgrade-from-release
>>>>>>> -suite-master-host-0:'umask
>>>>>>> >> >>>>>> 0077;
>>>>>>> >> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>>>>> ovirt-XXXXXXXXXX)";
>>>>>>> >> >>>>>> trap
>>>>>>> >> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr
>>>>>>> \"${MYTMP}\"
>>>>>>> >> >>>>>> >
>>>>>>> >> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C
>>>>>>> "${MYTMP}" -x &&
>>>>>>> >> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>>>>> >> >>>>>> DIALOG/customization=bool:True':
>>>>>>> TimeLimitExceededException: SSH
>>>>>>> >> >>>>>> session
>>>>>>> >> >>>>>> timeout host 'root@lago-upgrade-from-releas
>>>>>>> e-suite-master-host-0'
>>>>>>> >> >>>>>> 2018-05-25 13:57:44,259-04 ERROR
>>>>>>> >> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b]
>>>>>>> Timeout during
>>>>>>> >> >>>>>> host
>>>>>>> >> >>>>>> lago-upgrade-from-release-suite-master-host-0 install:
>>>>>>> SSH session
>>>>>>> >> >>>>>> timeout
>>>>>>> >> >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0'
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> There are no additional logs. SSH to host timeout. Are we
>>>>>>> sure that
>>>>>>> >> >>>>>> it
>>>>>>> >> >>>>>> is an issue caused by Ravi's change?
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> We have some quite strong circumstantial evidence:
>>>>>>> >> >>>>> - Issue had affected all engine patches since that patch in
>>>>>>> a
>>>>>>> >> >>>>> similar
>>>>>>> >> >>>>> fashion.
>>>>>>> >> >>>>> - Prior engine patch [1] passed successfully [2]
>>>>>>> >> >>>>> - Other subsequent OST runs without engine patches passed
>>>>>>> >> >>>>> successfully
>>>>>>> >> >>>>> as well [3].
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2
>>>>>>> >> >>>>> [2]:
>>>>>>> >> >>>>> http://jenkins.ovirt.org/job/o
>>>>>>> virt-master_change-queue-tester/7777/
>>>>>>> >> >>>>> [3]:
>>>>>>> >> >>>>> http://jenkins.ovirt.org/job/o
>>>>>>> virt-master_change-queue-tester/7778/
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Please note - the issue is affecting a test that is run by
>>>>>>> an
>>>>>>> >> >>>>> upgrade
>>>>>>> >> >>>>> suit on the post-upgrade system. It has no affect on the
>>>>>>> basic suit.
>>>>>>> >> >>>>> So it
>>>>>>> >> >>>>> probably has to do with some behaviour that is specific to
>>>>>>> upgraded
>>>>>>> >> >>>>> systems.
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> I will try to reproduce later today in dev env, but I agree
>>>>>>> with
>>>>>>> >> >>> Piotr's
>>>>>>> >> >>> investigation, engine was not able to connect to the host
>>>>>>> using SSH
>>>>>>> >> >>> and
>>>>>>> >> >>> that's why no host-deploy logs were fetched.
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> Lago fetches the logs from the host too (And it can take then
>>>>>>> from the
>>>>>>> >> >> VM
>>>>>>> >> >> image directly if the host is not responsive over SSH), can we
>>>>>>> get at
>>>>>>> >> >> the
>>>>>>> >> >> host-deploy logs that way?
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Thanks,
>>>>>>> >> >>>>>> Piotr
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina
>>>>>>> >> >>>>>> <[email protected]>
>>>>>>> >> >>>>>> wrote:
>>>>>>> >> >>>>>>>
>>>>>>> >> >>>>>>> Adding also Piotr to the thread
>>>>>>> >> >>>>>>>
>>>>>>> >> >>>>>>>
>>>>>>> >> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, <
>>>>>>> [email protected]>
>>>>>>> >> >>>>>>> wrote:
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ]
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> Link to suspected patches:
>>>>>>> >> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS
>>>>>>> versions < 1.2
>>>>>>> >> >>>>>>>> for hosts with cluster level>=4.1
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> Link to Job:
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o
>>>>>>> virt-master_change-queue-tester/7776/
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> Link to all logs:
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o
>>>>>>> virt-master_change-queue-tester/7776/artifact/exported-artif
>>>>>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
>>>>>>> from-release-suite-master/post-002_bootstrap.py/
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> Error snippet from log:
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> From nosetst log:
>>>>>>> >> >>>>>>>> <error>
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> AssertionError: False != True after 1200 seconds
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> </error>
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine
>>>>>>> for some
>>>>>>> >> >>>>>>>> reason.
>>>>>>> >> >>>>>>>> This seems to have cause consistent failure in all other
>>>>>>> engine
>>>>>>> >> >>>>>>>> patches that followed it.
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>>
>>>>>>> >> >>>>>>>> --
>>>>>>> >> >>>>>>>> Barak Korren
>>>>>>> >> >>>>>>>> RHV DevOps team , RHCE, RHCi
>>>>>>> >> >>>>>>>> Red Hat EMEA
>>>>>>> >> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. |
>>>>>>> redhat.com/trusted
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> --
>>>>>>> >> >>>>> Barak Korren
>>>>>>> >> >>>>> RHV DevOps team , RHCE, RHCi
>>>>>>> >> >>>>> Red Hat EMEA
>>>>>>> >> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>>>> >> >>>>
>>>>>>> >> >>>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> --
>>>>>>> >> >>> Martin Perina
>>>>>>> >> >>> Associate Manager, Software Engineering
>>>>>>> >> >>> Red Hat Czech s.r.o.
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> --
>>>>>>> >> >> Barak Korren
>>>>>>> >> >> RHV DevOps team , RHCE, RHCi
>>>>>>> >> >> Red Hat EMEA
>>>>>>> >> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > --
>>>>>>> >> > Barak Korren
>>>>>>> >> > RHV DevOps team , RHCE, RHCi
>>>>>>> >> > Red Hat EMEA
>>>>>>> >> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>>>> >> >
>>>>>>> >> > _______________________________________________
>>>>>>> >> > Devel mailing list -- [email protected]
>>>>>>> >> > To unsubscribe send an email to [email protected]
>>>>>>> >> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>>> >> > oVirt Code of Conduct:
>>>>>>> >> > https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> >> > List Archives:
>>>>>>> >> >
>>>>>>> >> > https://lists.ovirt.org/archives/list/[email protected]/messag
>>>>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>>>>>>> >> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Barak Korren
>>>>>>> > RHV DevOps team , RHCE, RHCi
>>>>>>> > Red Hat EMEA
>>>>>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>>>> _______________________________________________
>>>>>>> Devel mailing list -- [email protected]
>>>>>>> To unsubscribe send an email to [email protected]
>>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit
>>>>>>> y/about/community-guidelines/
>>>>>>> List Archives: https://lists.ovirt.org/archiv
>>>>>>> es/list/[email protected]/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Martin Perina
>>> Associate Manager, Software Engineering
>>> Red Hat Czech s.r.o.
>>>
>>
>>
>>
>> --
>> Barak Korren
>> RHV DevOps team , RHCE, RHCi
>> Red Hat EMEA
>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>
>
>
>
> --
> Martin Perina
> Associate Manager, Software Engineering
> Red Hat Czech s.r.o.
>



-- 
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/QUIG5FBTLUJTGSTLJKH4C464ZDZOPJW4/

Reply via email to