On Sun, Dec 2, 2018 at 8:51 PM Nir Soffer <nsof...@redhat.com> wrote:
> On Sun, Dec 2, 2018 at 8:33 PM Gal Ben Haim <gbenh...@redhat.com> wrote: > >> >> In order to not block other patches on CQ, I've sent [1] which will double >> the amount of space on the ISCSI SD (with the patch it will have 40GB). >> > And in addition we need to prioritize the fix for this bug + backport to 4.2. I'd suggest that after the bug will be fixed, revert this change [1] and in case everything is back on track for few executions, apply it again and keep it > >> As a side note, we use the same configuration on the master suite, which >> may explain >> why we don't see the issue there. >> > > Why did we use different configurations? > > Can we extract the configuration to external file that will be shared by > both master > and 4.x suites? > > >> >> [1] https://gerrit.ovirt.org/#/c/95922/ >> >> On Sun, Dec 2, 2018 at 5:41 PM Gal Ben Haim <gbenh...@redhat.com> wrote: >> >>> Below you can find 2 jobs, one that succeeded and the other failed on >>> the iscsi issue. >>> Both were triggered by unrelated patches. >>> >>> Success - >>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/ >>> Failure - >>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/ >>> >>> >>> On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim <gbenh...@redhat.com> wrote: >>> >>>> Raz, thanks for the investigation. >>>> I'll send a patch for increasing the luns size. >>>> >>>> On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer <nsof...@redhat.com> wrote: >>>> >>>>> On Sun, Dec 2, 2018, 10:44 Raz Tamir <rata...@redhat.com wrote: >>>>> >>>>>> After some analysis, I think the bug we are seeing here is >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1588061 >>>>>> This applies for suspend/resume and also for a snapshot with memory. >>>>>> Following the steps and considering that the iscsi storage domain is >>>>>> only 20GB, this should be the reason for reaching ~4GB free space >>>>>> >>>>> >>>>> >>>>> OST configuration should change so it is will not fail because of such >>>>> bugs. >>>>> >>>> >>>> I disagree. the purpose of OST it to catch bugs, not covering them. >>>> >>>>> >>>>> Iscsi storage can be created using sparse files, not consuming any >>>>> resources until you write to the lvs, so having 100g storage domain cost >>>>> nothing. >>>>> >>>> >>>> OST use sparse files. >>>> >>>>> >>>>> Nir >>>>> >>>>> >>>>>> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir <rata...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Nov 30, 2018, 21:57 Ryan Barry <rba...@redhat.com wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir <rata...@redhat.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Nov 30, 2018, 19:33 Dafna Ron <d...@redhat.com wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> This mail is to provide the current status of CQ and allow people >>>>>>>>>> to review status before and after the weekend. >>>>>>>>>> Please refer to below colour map for further information on the >>>>>>>>>> meaning of the colours. >>>>>>>>>> >>>>>>>>>> *CQ-4.2*: RED (#1) >>>>>>>>>> >>>>>>>>>> I checked last date ovirt-engine and vdsm passed and moved >>>>>>>>>> packages to tested as they are the bigger projects and it was on the >>>>>>>>>> 27-11-218. >>>>>>>>>> >>>>>>>>>> We have been having sporadic failures for most of the projects on >>>>>>>>>> test check_snapshot_with_memory. >>>>>>>>>> We have deducted that this is caused by a code regression in >>>>>>>>>> storage based on the following things: >>>>>>>>>> 1.Evgheni and Gal helped debug this issue to rule out lago and >>>>>>>>>> infra issue as the cause of failure and both determined the issue is >>>>>>>>>> a code >>>>>>>>>> regression - most likely in storage. >>>>>>>>>> 2. The failure only happens on 4.2 branch. >>>>>>>>>> 3. the failure itself is cannot run a vm due to low disk space in >>>>>>>>>> storage domain and we cannot see any failures which would leave any >>>>>>>>>> leftovers in the storage domain. >>>>>>>>>> >>>>>>>>> Can you please share the link to the execution? >>>>>>>>> >>>>>>>> >>>>>>>> Here's an example of one run: >>>>>>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/ >>>>>>>> >>>>>>>> The iSCSI storage domain starts emitting warnings about low storage >>>>>>>> space immediately after removing the VmPool, but it's possible that the >>>>>>>> storage domain is filling before that from some other call prior to >>>>>>>> that >>>>>>>> which is still running, possibly the VM import. >>>>>>>> >>>>>>> Thanks Ryan, I'll try to help with debugging this issue >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> Dan and Ryan are actively involved in trying to find the >>>>>>>>>> regression but the consensus is that this is a storage related >>>>>>>>>> regression and* we are having a problem getting the storage team >>>>>>>>>> to join us in debugging the issue. * >>>>>>>>>> >>>>>>>>>> I prepared a patch to skip the test in case we cannot get >>>>>>>>>> cooperation from storage team and resolve this regression in the >>>>>>>>>> next few >>>>>>>>>> days: >>>>>>>>>> https://gerrit.ovirt.org/#/c/95889/ >>>>>>>>>> >>>>>>>>>> *CQ-Master:* YELLOW (#1) >>>>>>>>>> >>>>>>>>>> We have failures which CQ is still bisecting and until its done >>>>>>>>>> we cannot point to any specific failing projects. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Happy week! >>>>>>>>>> Dafna >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------------------------------------------- >>>>>>>>>> COLOUR MAP >>>>>>>>>> >>>>>>>>>> Green = job has been passing successfully >>>>>>>>>> >>>>>>>>>> ** green for more than 3 days may suggest we need a review of our >>>>>>>>>> test coverage >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. >>>>>>>>>> >>>>>>>>>> 1-3 days GREEN (#1) >>>>>>>>>> 2. >>>>>>>>>> >>>>>>>>>> 4-7 days GREEN (#2) >>>>>>>>>> 3. >>>>>>>>>> >>>>>>>>>> Over 7 days GREEN (#3) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yellow = intermittent failures for different projects but no >>>>>>>>>> lasting or current regressions >>>>>>>>>> >>>>>>>>>> ** intermittent would be a healthy project as we expect a number >>>>>>>>>> of failures during the week >>>>>>>>>> >>>>>>>>>> ** I will not report any of the solved failures or regressions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. >>>>>>>>>> >>>>>>>>>> Solved job failures YELLOW (#1) >>>>>>>>>> 2. >>>>>>>>>> >>>>>>>>>> Solved regressions YELLOW (#2) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Red = job has been failing >>>>>>>>>> >>>>>>>>>> ** Active Failures. The colour will change based on the amount of >>>>>>>>>> time the project/s has been broken. Only active regressions would be >>>>>>>>>> reported. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. >>>>>>>>>> >>>>>>>>>> 1-3 days RED (#1) >>>>>>>>>> 2. >>>>>>>>>> >>>>>>>>>> 4-7 days RED (#2) >>>>>>>>>> 3. >>>>>>>>>> >>>>>>>>>> Over 7 days RED (#3) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Ryan Barry >>>>>>>> >>>>>>>> Associate Manager - RHV Virt/SLA >>>>>>>> >>>>>>>> rba...@redhat.com M: +16518159306 IM: rbarry >>>>>>>> <https://red.ht/sig> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> Raz Tamir >>>>>> Manager, RHV QE >>>>>> _______________________________________________ >>>>>> Devel mailing list -- devel@ovirt.org >>>>>> To unsubscribe send an email to devel-le...@ovirt.org >>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> oVirt Code of Conduct: >>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>> List Archives: >>>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6EFAA4LR743GLDGGNVCK2PEOHL7USLB7/ >>>>>> >>>>> _______________________________________________ >>>>> Devel mailing list -- devel@ovirt.org >>>>> To unsubscribe send an email to devel-le...@ovirt.org >>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>> oVirt Code of Conduct: >>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>> List Archives: >>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ZNMZS7V2TLRRXTYJ4EQ3R44Z634IL62T/ >>>>> >>>> >>>> >>>> -- >>>> *GAL bEN HAIM* >>>> RHV DEVOPS >>>> >>> >>> >>> -- >>> *GAL bEN HAIM* >>> RHV DEVOPS >>> >> >> >> -- >> *GAL bEN HAIM* >> RHV DEVOPS >> _______________________________________________ >> Devel mailing list -- devel@ovirt.org >> To unsubscribe send an email to devel-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MP277EZWHCFEHDHFSENQZWIVDXTLAP3I/ >> > _______________________________________________ > Devel mailing list -- devel@ovirt.org > To unsubscribe send an email to devel-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/IUGVNP6C2LTLUQNAGPR7EQ3OWNUVMDSQ/ > -- Raz Tamir Manager, RHV QE
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/FFM52TIEFUNCL2VJO6SALIS6L5JRJMXL/