On Sun, Dec 2, 2018 at 8:51 PM Nir Soffer <nsof...@redhat.com> wrote:

> On Sun, Dec 2, 2018 at 8:33 PM Gal Ben Haim <gbenh...@redhat.com> wrote:
>
>>
>> In order to not block other patches on CQ, I've sent [1] which will double
>> the amount of space on the ISCSI SD (with the patch it will have 40GB).
>>
> And in addition we need to prioritize the fix for this bug + backport to
4.2.
I'd suggest that after the bug will be fixed, revert this change [1] and in
case everything is back on track for few executions, apply it again and
keep it

>
>> As a side note, we use the same configuration on the master suite, which
>> may explain
>> why we don't see the issue there.
>>
>
> Why did we use different configurations?
>
> Can we extract the configuration to external file that will be shared by
> both master
> and 4.x suites?
>
>
>>
>> [1] https://gerrit.ovirt.org/#/c/95922/
>>
>> On Sun, Dec 2, 2018 at 5:41 PM Gal Ben Haim <gbenh...@redhat.com> wrote:
>>
>>> Below you can find 2 jobs, one that succeeded and the other failed on
>>> the iscsi issue.
>>> Both were triggered by unrelated patches.
>>>
>>> Success -
>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/
>>> Failure -
>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/
>>>
>>>
>>> On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim <gbenh...@redhat.com> wrote:
>>>
>>>> Raz, thanks for the investigation.
>>>> I'll send a patch for increasing the luns size.
>>>>
>>>> On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer <nsof...@redhat.com> wrote:
>>>>
>>>>> On Sun, Dec 2, 2018, 10:44 Raz Tamir <rata...@redhat.com wrote:
>>>>>
>>>>>> After some analysis, I think the bug we are seeing here is
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1588061
>>>>>> This applies for suspend/resume and also for a snapshot with memory.
>>>>>> Following the steps and considering that the iscsi storage domain is
>>>>>> only 20GB, this should be the reason for reaching ~4GB free space
>>>>>>
>>>>>
>>>>>
>>>>> OST configuration should change so it is will not fail because of such
>>>>> bugs.
>>>>>
>>>>
>>>> I disagree. the purpose of OST it to catch bugs, not covering them.
>>>>
>>>>>
>>>>> Iscsi storage can be created using sparse files, not consuming any
>>>>> resources until you write to the lvs, so having 100g storage domain cost
>>>>> nothing.
>>>>>
>>>>
>>>> OST use sparse files.
>>>>
>>>>>
>>>>> Nir
>>>>>
>>>>>
>>>>>> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir <rata...@redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 30, 2018, 21:57 Ryan Barry <rba...@redhat.com wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir <rata...@redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Nov 30, 2018, 19:33 Dafna Ron <d...@redhat.com wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> This mail is to provide the current status of CQ and allow people
>>>>>>>>>> to review status before and after the weekend.
>>>>>>>>>> Please refer to below colour map for further information on the
>>>>>>>>>> meaning of the colours.
>>>>>>>>>>
>>>>>>>>>> *CQ-4.2*: RED (#1)
>>>>>>>>>>
>>>>>>>>>> I checked last date ovirt-engine and vdsm passed and moved
>>>>>>>>>> packages to tested as they are the bigger projects and it was on the
>>>>>>>>>> 27-11-218.
>>>>>>>>>>
>>>>>>>>>> We have been having sporadic failures for most of the projects on
>>>>>>>>>> test check_snapshot_with_memory.
>>>>>>>>>> We have deducted that this is caused by a code regression in
>>>>>>>>>> storage based on the following things:
>>>>>>>>>> 1.Evgheni and Gal helped debug this issue to rule out lago and
>>>>>>>>>> infra issue as the cause of failure and both determined the issue is 
>>>>>>>>>> a code
>>>>>>>>>> regression - most likely in storage.
>>>>>>>>>> 2. The failure only happens on 4.2 branch.
>>>>>>>>>> 3. the failure itself is cannot run a vm due to low disk space in
>>>>>>>>>> storage domain and we cannot see any failures which would leave any
>>>>>>>>>> leftovers in the storage domain.
>>>>>>>>>>
>>>>>>>>> Can you please share the link to the execution?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here's an example of one run:
>>>>>>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>>>>>>>>
>>>>>>>> The iSCSI storage domain starts emitting warnings about low storage
>>>>>>>> space immediately after removing the VmPool, but it's possible that the
>>>>>>>> storage domain is filling before that from some other call prior to 
>>>>>>>> that
>>>>>>>> which is still running, possibly the VM import.
>>>>>>>>
>>>>>>> Thanks Ryan, I'll try to help with debugging this issue
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Dan and Ryan are actively involved in trying to find the
>>>>>>>>>> regression but the consensus is that this is a storage related
>>>>>>>>>> regression and* we are having a problem getting the storage team
>>>>>>>>>> to join us in debugging the issue. *
>>>>>>>>>>
>>>>>>>>>> I prepared a patch to skip the test in case we cannot get
>>>>>>>>>> cooperation from storage team and resolve this regression in the 
>>>>>>>>>> next few
>>>>>>>>>> days:
>>>>>>>>>> https://gerrit.ovirt.org/#/c/95889/
>>>>>>>>>>
>>>>>>>>>> *CQ-Master:* YELLOW (#1)
>>>>>>>>>>
>>>>>>>>>> We have failures which CQ is still bisecting and until its done
>>>>>>>>>> we cannot point to any specific failing projects.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Happy week!
>>>>>>>>>> Dafna
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------------------------------------------------------------
>>>>>>>>>> COLOUR MAP
>>>>>>>>>>
>>>>>>>>>> Green = job has been passing successfully
>>>>>>>>>>
>>>>>>>>>> ** green for more than 3 days may suggest we need a review of our
>>>>>>>>>> test coverage
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1.
>>>>>>>>>>
>>>>>>>>>>    1-3 days       GREEN (#1)
>>>>>>>>>>    2.
>>>>>>>>>>
>>>>>>>>>>    4-7 days       GREEN (#2)
>>>>>>>>>>    3.
>>>>>>>>>>
>>>>>>>>>>    Over 7 days GREEN (#3)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yellow = intermittent failures for different projects but no
>>>>>>>>>> lasting or current regressions
>>>>>>>>>>
>>>>>>>>>> ** intermittent would be a healthy project as we expect a number
>>>>>>>>>> of failures during the week
>>>>>>>>>>
>>>>>>>>>> ** I will not report any of the solved failures or regressions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1.
>>>>>>>>>>
>>>>>>>>>>    Solved job failures        YELLOW (#1)
>>>>>>>>>>    2.
>>>>>>>>>>
>>>>>>>>>>    Solved regressions      YELLOW (#2)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Red = job has been failing
>>>>>>>>>>
>>>>>>>>>> ** Active Failures. The colour will change based on the amount of
>>>>>>>>>> time the project/s has been broken. Only active regressions would be
>>>>>>>>>> reported.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1.
>>>>>>>>>>
>>>>>>>>>>    1-3 days      RED (#1)
>>>>>>>>>>    2.
>>>>>>>>>>
>>>>>>>>>>    4-7 days      RED (#2)
>>>>>>>>>>    3.
>>>>>>>>>>
>>>>>>>>>>    Over 7 days RED (#3)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Ryan Barry
>>>>>>>>
>>>>>>>> Associate Manager - RHV Virt/SLA
>>>>>>>>
>>>>>>>> rba...@redhat.com    M: +16518159306     IM: rbarry
>>>>>>>> <https://red.ht/sig>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> Raz Tamir
>>>>>> Manager, RHV QE
>>>>>> _______________________________________________
>>>>>> Devel mailing list -- devel@ovirt.org
>>>>>> To unsubscribe send an email to devel-le...@ovirt.org
>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>> oVirt Code of Conduct:
>>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>>> List Archives:
>>>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6EFAA4LR743GLDGGNVCK2PEOHL7USLB7/
>>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list -- devel@ovirt.org
>>>>> To unsubscribe send an email to devel-le...@ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
>>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ZNMZS7V2TLRRXTYJ4EQ3R44Z634IL62T/
>>>>>
>>>>
>>>>
>>>> --
>>>> *GAL bEN HAIM*
>>>> RHV DEVOPS
>>>>
>>>
>>>
>>> --
>>> *GAL bEN HAIM*
>>> RHV DEVOPS
>>>
>>
>>
>> --
>> *GAL bEN HAIM*
>> RHV DEVOPS
>> _______________________________________________
>> Devel mailing list -- devel@ovirt.org
>> To unsubscribe send an email to devel-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MP277EZWHCFEHDHFSENQZWIVDXTLAP3I/
>>
> _______________________________________________
> Devel mailing list -- devel@ovirt.org
> To unsubscribe send an email to devel-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/IUGVNP6C2LTLUQNAGPR7EQ3OWNUVMDSQ/
>


-- 


Raz Tamir
Manager, RHV QE
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/FFM52TIEFUNCL2VJO6SALIS6L5JRJMXL/

Reply via email to