On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <[email protected]> wrote:
> Martin, do you have any updates? please note that ovirt-engine has been
> broken for a few days so perhaps we should stop merging or revert the
> original change?
>
Still looking at it, here are partial results:
1. New host installation: never reproduced, 4.2 host is always installed
fine on 4.2 engine
2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2
was always successfull
3. Reinstallation - once it happened to me that during reinstallation the
host remain stucked during Reinstallation and the whole reinstallation
failed due to timeout
- that may be the issue which can be seen in CI, but so far I don't
have reliable reproducer to be able to debug why host-deploy process on the
host is stucked
>
> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <[email protected]>
> wrote:
>
>> +Martin
>>
>> He is working on it.
>>
>> Thanks,
>> Piotr
>>
>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <[email protected]> wrote:
>>
>>> Hi Piotr,
>>>
>>> Any update on this?
>>>
>>> Thanks.
>>> Dafna
>>>
>>>
>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <
>>> [email protected]> wrote:
>>>
>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren <[email protected]>
>>>> wrote:
>>>> >
>>>> >
>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski <[email protected]
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <[email protected]>
>>>> wrote:
>>>> >> > Note: we're now seeing a very similar issue in the 4.2 branch as
>>>> well
>>>> >> > that
>>>> >> > seems to have been introduced by the following patch:
>>>> >>
>>>> >> Can you point to specific job so we could take a look at the logs?
>>>> >
>>>> >
>>>> > Whoops, sorry, here:
>>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>>>> >
>>>>
>>>> Looks like the same issue:
>>>>
>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error running
>>>> command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null
>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar
>>>> --warning=no-timestamp -C "${MYTMP}" -x &&
>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH
>>>> session timeout host
>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
>>>> [1244c90f] Error during deploy dialog
>>>> 2018-05-28 03:41:03,611-04 ERROR
>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during
>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
>>>> session timeout host
>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>
>>>> >>
>>>> >>
>>>> >> >
>>>> >> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong
>>>> ciphers
>>>> >> > for
>>>> >> > 4.2 hosts
>>>> >> >
>>>> >> > On 28 May 2018 at 10:26, Barak Korren <[email protected]> wrote:
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On 28 May 2018 at 10:19, Martin Perina <[email protected]>
>>>> wrote:
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski
>>>> >> >>> <[email protected]>
>>>> >> >>> wrote:
>>>> >> >>>>
>>>> >> >>>> Simone,
>>>> >> >>>>
>>>> >> >>>> What do you think about this failure?
>>>> >> >>>>
>>>> >> >>>> Thanks,
>>>> >> >>>> Piotr
>>>> >> >>>>
>>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren <
>>>> [email protected]>
>>>> >> >>>> wrote:
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski <[email protected]
>>>> >
>>>> >> >>>>> wrote:
>>>> >> >>>>>>
>>>> >> >>>>>> Martin,
>>>> >> >>>>>>
>>>> >> >>>>>> I only can see:
>>>> >> >>>>>>
>>>> >> >>>>>> 2018-05-25 13:57:44,255-04 ERROR
>>>> >> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH
>>>> error
>>>> >> >>>>>> running
>>>> >> >>>>>> command root@lago-upgrade-from-release
>>>> -suite-master-host-0:'umask
>>>> >> >>>>>> 0077;
>>>> >> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>> ovirt-XXXXXXXXXX)";
>>>> >> >>>>>> trap
>>>> >> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr
>>>> \"${MYTMP}\"
>>>> >> >>>>>> >
>>>> >> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}"
>>>> -x &&
>>>> >> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>> >> >>>>>> DIALOG/customization=bool:True': TimeLimitExceededException:
>>>> SSH
>>>> >> >>>>>> session
>>>> >> >>>>>> timeout host 'root@lago-upgrade-from-releas
>>>> e-suite-master-host-0'
>>>> >> >>>>>> 2018-05-25 13:57:44,259-04 ERROR
>>>> >> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] Timeout
>>>> during
>>>> >> >>>>>> host
>>>> >> >>>>>> lago-upgrade-from-release-suite-master-host-0 install: SSH
>>>> session
>>>> >> >>>>>> timeout
>>>> >> >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0'
>>>> >> >>>>>>
>>>> >> >>>>>> There are no additional logs. SSH to host timeout. Are we
>>>> sure that
>>>> >> >>>>>> it
>>>> >> >>>>>> is an issue caused by Ravi's change?
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> We have some quite strong circumstantial evidence:
>>>> >> >>>>> - Issue had affected all engine patches since that patch in a
>>>> >> >>>>> similar
>>>> >> >>>>> fashion.
>>>> >> >>>>> - Prior engine patch [1] passed successfully [2]
>>>> >> >>>>> - Other subsequent OST runs without engine patches passed
>>>> >> >>>>> successfully
>>>> >> >>>>> as well [3].
>>>> >> >>>>>
>>>> >> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2
>>>> >> >>>>> [2]:
>>>> >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
>>>> r/7777/
>>>> >> >>>>> [3]:
>>>> >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
>>>> r/7778/
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> Please note - the issue is affecting a test that is run by an
>>>> >> >>>>> upgrade
>>>> >> >>>>> suit on the post-upgrade system. It has no affect on the basic
>>>> suit.
>>>> >> >>>>> So it
>>>> >> >>>>> probably has to do with some behaviour that is specific to
>>>> upgraded
>>>> >> >>>>> systems.
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> I will try to reproduce later today in dev env, but I agree with
>>>> >> >>> Piotr's
>>>> >> >>> investigation, engine was not able to connect to the host using
>>>> SSH
>>>> >> >>> and
>>>> >> >>> that's why no host-deploy logs were fetched.
>>>> >> >>
>>>> >> >>
>>>> >> >> Lago fetches the logs from the host too (And it can take then
>>>> from the
>>>> >> >> VM
>>>> >> >> image directly if the host is not responsive over SSH), can we
>>>> get at
>>>> >> >> the
>>>> >> >> host-deploy logs that way?
>>>> >> >>
>>>> >> >>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>> Thanks,
>>>> >> >>>>>> Piotr
>>>> >> >>>>>>
>>>> >> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina
>>>> >> >>>>>> <[email protected]>
>>>> >> >>>>>> wrote:
>>>> >> >>>>>>>
>>>> >> >>>>>>> Adding also Piotr to the thread
>>>> >> >>>>>>>
>>>> >> >>>>>>>
>>>> >> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, <[email protected]
>>>> >
>>>> >> >>>>>>> wrote:
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ]
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> Link to suspected patches:
>>>> >> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS
>>>> versions < 1.2
>>>> >> >>>>>>>> for hosts with cluster level>=4.1
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> Link to Job:
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o
>>>> virt-master_change-queue-tester/7776/
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> Link to all logs:
>>>> >> >>>>>>>>
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o
>>>> virt-master_change-queue-tester/7776/artifact/exported-artif
>>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
>>>> from-release-suite-master/post-002_bootstrap.py/
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> Error snippet from log:
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> From nosetst log:
>>>> >> >>>>>>>> <error>
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> AssertionError: False != True after 1200 seconds
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> </error>
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine for
>>>> some
>>>> >> >>>>>>>> reason.
>>>> >> >>>>>>>> This seems to have cause consistent failure in all other
>>>> engine
>>>> >> >>>>>>>> patches that followed it.
>>>> >> >>>>>>>>
>>>> >> >>>>>>>>
>>>> >> >>>>>>>> --
>>>> >> >>>>>>>> Barak Korren
>>>> >> >>>>>>>> RHV DevOps team , RHCE, RHCi
>>>> >> >>>>>>>> Red Hat EMEA
>>>> >> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>> >> >>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> --
>>>> >> >>>>> Barak Korren
>>>> >> >>>>> RHV DevOps team , RHCE, RHCi
>>>> >> >>>>> Red Hat EMEA
>>>> >> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Martin Perina
>>>> >> >>> Associate Manager, Software Engineering
>>>> >> >>> Red Hat Czech s.r.o.
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Barak Korren
>>>> >> >> RHV DevOps team , RHCE, RHCi
>>>> >> >> Red Hat EMEA
>>>> >> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Barak Korren
>>>> >> > RHV DevOps team , RHCE, RHCi
>>>> >> > Red Hat EMEA
>>>> >> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Devel mailing list -- [email protected]
>>>> >> > To unsubscribe send an email to [email protected]
>>>> >> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>> >> > oVirt Code of Conduct:
>>>> >> > https://www.ovirt.org/community/about/community-guidelines/
>>>> >> > List Archives:
>>>> >> >
>>>> >> > https://lists.ovirt.org/archives/list/[email protected]/messag
>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>>>> >> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Barak Korren
>>>> > RHV DevOps team , RHCE, RHCi
>>>> > Red Hat EMEA
>>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>> _______________________________________________
>>>> Devel mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>> oVirt Code of Conduct: https://www.ovirt.org/communit
>>>> y/about/community-guidelines/
>>>> List Archives: https://lists.ovirt.org/archiv
>>>> es/list/[email protected]/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
>>>>
>>>
>>>
>>
>
--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/[email protected]/message/GPXNGNAH7FQCQRQJGKJXZU5OOGVCCEYO/