On 29 May 2018 at 16:30, Martin Perina <[email protected]> wrote:

>
>
> On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <[email protected]> wrote:
>
>> Martin, do you have any updates? please note that ovirt-engine has been
>> broken for a few days so perhaps we should stop merging or revert the
>> original change?
>>
>
> ​Still looking at it, here are partial results:
>
> 1. New host installation: never reproduced, 4.2 host is always installed
> fine on 4.2 engine
> 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2
> was always successfull
> 3. Reinstallation - once it happened to me that during reinstallation the
> host remain stucked during Reinstallation and the whole​ reinstallation
> failed due to timeout
>     - that may be the issue which can be seen in CI, but so far I don't
> have reliable reproducer to be able to debug why host-deploy process on the
> host is stucked
>

Did you try using OST locally? it reproduces consistently with the OST
upgrade suit. You can also use the manual job and pass a URL to any engine
build beyond the marked patch. But there you'll have the same issue as with
the CQ job where you won't have logs...

Note, the process that happens there is AFAIK:
1. The oVirt 4.1 release is installed.
2. engine-setup runs
3. repos are changed to the master repo
4. engine is upgraded
5. bootstrap (including AddHost that fails is carried out)


>
>
>>
>> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <[email protected]>
>> wrote:
>>
>>> +Martin
>>>
>>> He is working on it.
>>>
>>> Thanks,
>>> Piotr
>>>
>>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <[email protected]> wrote:
>>>
>>>> Hi Piotr,
>>>>
>>>> Any update on this?
>>>>
>>>> Thanks.
>>>> Dafna
>>>>
>>>>
>>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <
>>>> [email protected]> wrote:
>>>>
>>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski <
>>>>> [email protected]>
>>>>> > wrote:
>>>>> >>
>>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <[email protected]>
>>>>> wrote:
>>>>> >> > Note: we're now seeing a very similar issue in the 4.2 branch as
>>>>> well
>>>>> >> > that
>>>>> >> > seems to have been introduced by the following patch:
>>>>> >>
>>>>> >> Can you point to specific job so we could take a look at the logs?
>>>>> >
>>>>> >
>>>>> > Whoops, sorry, here:
>>>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>>>>> >
>>>>>
>>>>> Looks like the same issue:
>>>>>
>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error running
>>>>> command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
>>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null
>>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar
>>>>> --warning=no-timestamp -C "${MYTMP}" -x &&
>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH
>>>>> session timeout host
>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
>>>>> [1244c90f] Error during deploy dialog
>>>>> 2018-05-28 03:41:03,611-04 ERROR
>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during
>>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
>>>>> session timeout host
>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>
>>>>> >>
>>>>> >>
>>>>> >> >
>>>>> >> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong
>>>>> ciphers
>>>>> >> > for
>>>>> >> > 4.2 hosts
>>>>> >> >
>>>>> >> > On 28 May 2018 at 10:26, Barak Korren <[email protected]> wrote:
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> On 28 May 2018 at 10:19, Martin Perina <[email protected]>
>>>>> wrote:
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski
>>>>> >> >>> <[email protected]>
>>>>> >> >>> wrote:
>>>>> >> >>>>
>>>>> >> >>>> Simone,
>>>>> >> >>>>
>>>>> >> >>>> What do you think about this failure?
>>>>> >> >>>>
>>>>> >> >>>> Thanks,
>>>>> >> >>>> Piotr
>>>>> >> >>>>
>>>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren <
>>>>> [email protected]>
>>>>> >> >>>> wrote:
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski <
>>>>> [email protected]>
>>>>> >> >>>>> wrote:
>>>>> >> >>>>>>
>>>>> >> >>>>>> Martin,
>>>>> >> >>>>>>
>>>>> >> >>>>>> I only can see:
>>>>> >> >>>>>>
>>>>> >> >>>>>> 2018-05-25 13:57:44,255-04 ERROR
>>>>> >> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH
>>>>> error
>>>>> >> >>>>>> running
>>>>> >> >>>>>> command root@lago-upgrade-from-release
>>>>> -suite-master-host-0:'umask
>>>>> >> >>>>>> 0077;
>>>>> >> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>>> ovirt-XXXXXXXXXX)";
>>>>> >> >>>>>> trap
>>>>> >> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr
>>>>> \"${MYTMP}\"
>>>>> >> >>>>>> >
>>>>> >> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}"
>>>>> -x &&
>>>>> >> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>>> >> >>>>>> DIALOG/customization=bool:True':
>>>>> TimeLimitExceededException: SSH
>>>>> >> >>>>>> session
>>>>> >> >>>>>> timeout host 'root@lago-upgrade-from-releas
>>>>> e-suite-master-host-0'
>>>>> >> >>>>>> 2018-05-25 13:57:44,259-04 ERROR
>>>>> >> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b]
>>>>> Timeout during
>>>>> >> >>>>>> host
>>>>> >> >>>>>> lago-upgrade-from-release-suite-master-host-0 install: SSH
>>>>> session
>>>>> >> >>>>>> timeout
>>>>> >> >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0'
>>>>> >> >>>>>>
>>>>> >> >>>>>> There are no additional logs. SSH to host timeout. Are we
>>>>> sure that
>>>>> >> >>>>>> it
>>>>> >> >>>>>> is an issue caused by Ravi's change?
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>> We have some quite strong circumstantial evidence:
>>>>> >> >>>>> - Issue had affected all engine patches since that patch in a
>>>>> >> >>>>> similar
>>>>> >> >>>>> fashion.
>>>>> >> >>>>> - Prior engine patch [1] passed successfully [2]
>>>>> >> >>>>> - Other subsequent OST runs without engine patches passed
>>>>> >> >>>>> successfully
>>>>> >> >>>>> as well [3].
>>>>> >> >>>>>
>>>>> >> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2
>>>>> >> >>>>> [2]:
>>>>> >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
>>>>> r/7777/
>>>>> >> >>>>> [3]:
>>>>> >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
>>>>> r/7778/
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>> Please note - the issue is affecting a test that is run by an
>>>>> >> >>>>> upgrade
>>>>> >> >>>>> suit on the post-upgrade system. It has no affect on the
>>>>> basic suit.
>>>>> >> >>>>> So it
>>>>> >> >>>>> probably has to do with some behaviour that is specific to
>>>>> upgraded
>>>>> >> >>>>> systems.
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> I will try to reproduce later today in dev env, but I agree with
>>>>> >> >>> Piotr's
>>>>> >> >>> investigation, engine was not able to connect to the host using
>>>>> SSH
>>>>> >> >>> and
>>>>> >> >>> that's why no host-deploy logs were fetched.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Lago fetches the logs from the host too (And it can take then
>>>>> from the
>>>>> >> >> VM
>>>>> >> >> image directly if the host is not responsive over SSH), can we
>>>>> get at
>>>>> >> >> the
>>>>> >> >> host-deploy logs that way?
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>> Thanks,
>>>>> >> >>>>>> Piotr
>>>>> >> >>>>>>
>>>>> >> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina
>>>>> >> >>>>>> <[email protected]>
>>>>> >> >>>>>> wrote:
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> Adding also Piotr to the thread
>>>>> >> >>>>>>>
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, <
>>>>> [email protected]>
>>>>> >> >>>>>>> wrote:
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ]
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> Link to suspected patches:
>>>>> >> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS
>>>>> versions < 1.2
>>>>> >> >>>>>>>> for hosts with cluster level>=4.1
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> Link to Job:
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o
>>>>> virt-master_change-queue-tester/7776/
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> Link to all logs:
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> http://jenkins.ovirt.org/job/o
>>>>> virt-master_change-queue-tester/7776/artifact/exported-artif
>>>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
>>>>> from-release-suite-master/post-002_bootstrap.py/
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> Error snippet from log:
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> From nosetst log:
>>>>> >> >>>>>>>> <error>
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> AssertionError: False != True after 1200 seconds
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> </error>
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine for
>>>>> some
>>>>> >> >>>>>>>> reason.
>>>>> >> >>>>>>>> This seems to have cause consistent failure in all other
>>>>> engine
>>>>> >> >>>>>>>> patches that followed it.
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>>
>>>>> >> >>>>>>>> --
>>>>> >> >>>>>>>> Barak Korren
>>>>> >> >>>>>>>> RHV DevOps team , RHCE, RHCi
>>>>> >> >>>>>>>> Red Hat EMEA
>>>>> >> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>> >> >>>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>> --
>>>>> >> >>>>> Barak Korren
>>>>> >> >>>>> RHV DevOps team , RHCE, RHCi
>>>>> >> >>>>> Red Hat EMEA
>>>>> >> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>> >> >>>>
>>>>> >> >>>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Martin Perina
>>>>> >> >>> Associate Manager, Software Engineering
>>>>> >> >>> Red Hat Czech s.r.o.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Barak Korren
>>>>> >> >> RHV DevOps team , RHCE, RHCi
>>>>> >> >> Red Hat EMEA
>>>>> >> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > Barak Korren
>>>>> >> > RHV DevOps team , RHCE, RHCi
>>>>> >> > Red Hat EMEA
>>>>> >> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>> >> >
>>>>> >> > _______________________________________________
>>>>> >> > Devel mailing list -- [email protected]
>>>>> >> > To unsubscribe send an email to [email protected]
>>>>> >> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> >> > oVirt Code of Conduct:
>>>>> >> > https://www.ovirt.org/community/about/community-guidelines/
>>>>> >> > List Archives:
>>>>> >> >
>>>>> >> > https://lists.ovirt.org/archives/list/[email protected]/messag
>>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>>>>> >> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Barak Korren
>>>>> > RHV DevOps team , RHCE, RHCi
>>>>> > Red Hat EMEA
>>>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>>> _______________________________________________
>>>>> Devel mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit
>>>>> y/about/community-guidelines/
>>>>> List Archives: https://lists.ovirt.org/archiv
>>>>> es/list/[email protected]/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Martin Perina
> Associate Manager, Software Engineering
> Red Hat Czech s.r.o.
>



-- 
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/YRIEA5YQHBO27XXJC2D3JFC4Z5N3TB6S/

Reply via email to