Re: [Gluster-infra] [Gluster-devel] bug-1432542-mpx-restart-crash.t failing

Amar Tumballi Mon, 09 Jul 2018 21:21:23 -0700

On Tue, Jul 10, 2018 at 9:35 AM, Poornima Gurusiddaiah <[email protected]>
wrote:


>
>
> On Tue, Jul 10, 2018, 9:30 AM Amar Tumballi <[email protected]> wrote:
>
>>
>>
>> On Mon, Jul 9, 2018 at 8:10 PM, Nithya Balachandran <[email protected]>
>> wrote:
>>
>>> We discussed reducing the number of volumes in the maintainers'
>>> meeting.Should we still go ahead and do that?
>>>
>>>
>>>
>> It would still be a good exercise, IMO. Reducing it to 50-60 volumes from
>> 120 now.
>>
> AFAIK, the test case only creates 20 volumes with 6 bricks and hence 120
> bricks served from one brick process. This results in 1000+ threads and 14g
> VIRT 4-5g RES.
>
>
Thanks for the pointers Poornima. 4-5g RES is a concern for sure, and the
1000+ threads. Mohit had some ideas about reducing them. We should consider
those as possible next 'resource management' task.


> Regards,
> Poornima
>
>
>>
>>> On 9 July 2018 at 15:45, Xavi Hernandez <[email protected]> wrote:
>>>
>>>> On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Deepshikha,
>>>>>
>>>>> Are you looking into this failure? I can still see this happening for
>>>>> all the regression runs.
>>>>>
>>>>
>>>> I've executed the failing script on my laptop and all tests finish
>>>> relatively fast. What seems to take time is the final cleanup. I can see
>>>> 'semanage' taking some CPU during destruction of volumes. The test required
>>>> 350 seconds to finish successfully.
>>>>
>>>> Not sure what caused the cleanup time to increase, but I've created a
>>>> bug [1] to track this and a patch [2] to give more time to this test. This
>>>> should allow all blocked regressions to complete successfully.
>>>>
>>>> Xavi
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1599250
>>>> [2] https://review.gluster.org/20482
>>>>
>>>>
>>>>> Thanks & Regards,
>>>>> Karthik
>>>>>
>>>>> On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> https://build.gluster.org/job/regression-test-with-
>>>>>> multiplex/794/display/redirect has the same test failing. Is the
>>>>>> reason of the failure different given this is on jenkins?
>>>>>>
>>>>>> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> The issue[1] has been resolved. Now the softserve instance will be
>>>>>>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing
>>>>>>> configurations.
>>>>>>>
>>>>>>> [1] https://github.com/gluster/softserve/issues/40
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Deepshikha Khandelwal
>>>>>>>
>>>>>>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya <
>>>>>>> [email protected]> wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, <
>>>>>>> [email protected]>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> Hi Poornima/Karthik,
>>>>>>> >>
>>>>>>> >> We've looked into the memory error that this softserve instance
>>>>>>> have
>>>>>>> >> showed up. These machine instances have 1GB RAM which is not in
>>>>>>> the
>>>>>>> >> case with the Jenkins builder. It's 2GB RAM there.
>>>>>>> >>
>>>>>>> >> We've created the issue [1] and will solve it sooner.
>>>>>>> >
>>>>>>> > Great. Thanks for the update.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Sorry for the inconvenience.
>>>>>>> >>
>>>>>>> >> [1] https://github.com/gluster/softserve/issues/40
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >> Deepshikha Khandelwal
>>>>>>> >>
>>>>>>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <
>>>>>>> [email protected]>
>>>>>>> >> wrote:
>>>>>>> >> > Thanks Poornima for the analysis.
>>>>>>> >> > Can someone work on fixing this please?
>>>>>>> >> >
>>>>>>> >> > ~Karthik
>>>>>>> >> >
>>>>>>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>>>>>>> >> > <[email protected]>
>>>>>>> >> > wrote:
>>>>>>> >> >>
>>>>>>> >> >> The same test case is failing for my patch as well [1]. I
>>>>>>> requested for
>>>>>>> >> >> a
>>>>>>> >> >> regression system and tried to reproduce it.
>>>>>>> >> >> From my analysis, the brick process (mutiplexed) is consuming
>>>>>>> a lot of
>>>>>>> >> >> memory, and is being OOM killed. The regression has 1GB ram
>>>>>>> and the
>>>>>>> >> >> process
>>>>>>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>>>>>>> >> >> considering
>>>>>>> >> >> there is 1000 threads in that brick process.
>>>>>>> >> >> Ways to fix:
>>>>>>> >> >> - Increase the regression system RAM size OR
>>>>>>> >> >> - Decrease the number of volumes in the test case.
>>>>>>> >> >>
>>>>>>> >> >> But what is strange is why the test passes sometimes for some
>>>>>>> patches.
>>>>>>> >> >> There could be some bug/? in memory consumption.
>>>>>>> >> >>
>>>>>>> >> >> Regards,
>>>>>>> >> >> Poornima
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya
>>>>>>> >> >> <[email protected]>
>>>>>>> >> >> wrote:
>>>>>>> >> >>>
>>>>>>> >> >>> Hi,
>>>>>>> >> >>>
>>>>>>> >> >>> $subject is failing on centos regression for most of the
>>>>>>> patches with
>>>>>>> >> >>> timeout error.
>>>>>>> >> >>>
>>>>>>> >> >>> 07:32:34
>>>>>>> >> >>>
>>>>>>> >> >>> ============================================================
>>>>>>> ====================
>>>>>>> >> >>> 07:32:34 [07:33:05] Running tests in file
>>>>>>> >> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>>> >> >>> 07:32:34 Timeout set is 300, default 200
>>>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>>> timed out
>>>>>>> >> >>> after 300 seconds
>>>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t:
>>>>>>> bad status
>>>>>>> >> >>> 124
>>>>>>> >> >>> 07:37:34
>>>>>>> >> >>> 07:37:34        *********************************
>>>>>>> >> >>> 07:37:34        *       REGRESSION FAILED       *
>>>>>>> >> >>> 07:37:34        * Retrying failed tests in case *
>>>>>>> >> >>> 07:37:34        * we got some spurious failures *
>>>>>>> >> >>> 07:37:34        *********************************
>>>>>>> >> >>> 07:37:34
>>>>>>> >> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>>> timed out
>>>>>>> >> >>> after 300 seconds
>>>>>>> >> >>> 07:42:34 End of test ./tests/bugs/core/bug-1432542-
>>>>>>> mpx-restart-crash.t
>>>>>>> >> >>> 07:42:34
>>>>>>> >> >>>
>>>>>>> >> >>> ============================================================
>>>>>>> ====================
>>>>>>> >> >>>
>>>>>>> >> >>> Can anyone take a look?
>>>>>>> >> >>>
>>>>>>> >> >>> Thanks,
>>>>>>> >> >>> Karthik
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> _______________________________________________
>>>>>>> >> >>> Gluster-devel mailing list
>>>>>>> >> >>> [email protected]
>>>>>>> >> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >> > _______________________________________________
>>>>>>> >> > Gluster-infra mailing list
>>>>>>> >> > [email protected]
>>>>>>> >> > https://lists.gluster.org/mailman/listinfo/gluster-infra
>>>>>>> _______________________________________________
>>>>>>> Gluster-devel mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>> --
>>>>>> - Atin (atinm)
>>>>>>
>>>>> _______________________________________________
>>>>> Gluster-infra mailing list
>>>>> [email protected]
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-infra
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> [email protected]
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-infra mailing list
>>> [email protected]
>>> https://lists.gluster.org/mailman/listinfo/gluster-infra
>>>
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>> _______________________________________________
>> Gluster-devel mailing list
>> [email protected]
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>


-- 
Amar Tumballi (amarts)

_______________________________________________
Gluster-infra mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] bug-1432542-mpx-restart-crash.t failing

Reply via email to