On Tue, Jul 10, 2018 at 9:35 AM, Poornima Gurusiddaiah <[email protected]> wrote:
> > > On Tue, Jul 10, 2018, 9:30 AM Amar Tumballi <[email protected]> wrote: > >> >> >> On Mon, Jul 9, 2018 at 8:10 PM, Nithya Balachandran <[email protected]> >> wrote: >> >>> We discussed reducing the number of volumes in the maintainers' >>> meeting.Should we still go ahead and do that? >>> >>> >>> >> It would still be a good exercise, IMO. Reducing it to 50-60 volumes from >> 120 now. >> > AFAIK, the test case only creates 20 volumes with 6 bricks and hence 120 > bricks served from one brick process. This results in 1000+ threads and 14g > VIRT 4-5g RES. > > Thanks for the pointers Poornima. 4-5g RES is a concern for sure, and the 1000+ threads. Mohit had some ideas about reducing them. We should consider those as possible next 'resource management' task. > Regards, > Poornima > > >> >>> On 9 July 2018 at 15:45, Xavi Hernandez <[email protected]> wrote: >>> >>>> On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya < >>>> [email protected]> wrote: >>>> >>>>> Hi Deepshikha, >>>>> >>>>> Are you looking into this failure? I can still see this happening for >>>>> all the regression runs. >>>>> >>>> >>>> I've executed the failing script on my laptop and all tests finish >>>> relatively fast. What seems to take time is the final cleanup. I can see >>>> 'semanage' taking some CPU during destruction of volumes. The test required >>>> 350 seconds to finish successfully. >>>> >>>> Not sure what caused the cleanup time to increase, but I've created a >>>> bug [1] to track this and a patch [2] to give more time to this test. This >>>> should allow all blocked regressions to complete successfully. >>>> >>>> Xavi >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1599250 >>>> [2] https://review.gluster.org/20482 >>>> >>>> >>>>> Thanks & Regards, >>>>> Karthik >>>>> >>>>> On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee <[email protected]> >>>>> wrote: >>>>> >>>>>> https://build.gluster.org/job/regression-test-with- >>>>>> multiplex/794/display/redirect has the same test failing. Is the >>>>>> reason of the failure different given this is on jenkins? >>>>>> >>>>>> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> The issue[1] has been resolved. Now the softserve instance will be >>>>>>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing >>>>>>> configurations. >>>>>>> >>>>>>> [1] https://github.com/gluster/softserve/issues/40 >>>>>>> >>>>>>> Thanks, >>>>>>> Deepshikha Khandelwal >>>>>>> >>>>>>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya < >>>>>>> [email protected]> wrote: >>>>>>> > >>>>>>> > >>>>>>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, < >>>>>>> [email protected]> >>>>>>> > wrote: >>>>>>> >> >>>>>>> >> Hi Poornima/Karthik, >>>>>>> >> >>>>>>> >> We've looked into the memory error that this softserve instance >>>>>>> have >>>>>>> >> showed up. These machine instances have 1GB RAM which is not in >>>>>>> the >>>>>>> >> case with the Jenkins builder. It's 2GB RAM there. >>>>>>> >> >>>>>>> >> We've created the issue [1] and will solve it sooner. >>>>>>> > >>>>>>> > Great. Thanks for the update. >>>>>>> >> >>>>>>> >> >>>>>>> >> Sorry for the inconvenience. >>>>>>> >> >>>>>>> >> [1] https://github.com/gluster/softserve/issues/40 >>>>>>> >> >>>>>>> >> Thanks, >>>>>>> >> Deepshikha Khandelwal >>>>>>> >> >>>>>>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya < >>>>>>> [email protected]> >>>>>>> >> wrote: >>>>>>> >> > Thanks Poornima for the analysis. >>>>>>> >> > Can someone work on fixing this please? >>>>>>> >> > >>>>>>> >> > ~Karthik >>>>>>> >> > >>>>>>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah >>>>>>> >> > <[email protected]> >>>>>>> >> > wrote: >>>>>>> >> >> >>>>>>> >> >> The same test case is failing for my patch as well [1]. I >>>>>>> requested for >>>>>>> >> >> a >>>>>>> >> >> regression system and tried to reproduce it. >>>>>>> >> >> From my analysis, the brick process (mutiplexed) is consuming >>>>>>> a lot of >>>>>>> >> >> memory, and is being OOM killed. The regression has 1GB ram >>>>>>> and the >>>>>>> >> >> process >>>>>>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable >>>>>>> >> >> considering >>>>>>> >> >> there is 1000 threads in that brick process. >>>>>>> >> >> Ways to fix: >>>>>>> >> >> - Increase the regression system RAM size OR >>>>>>> >> >> - Decrease the number of volumes in the test case. >>>>>>> >> >> >>>>>>> >> >> But what is strange is why the test passes sometimes for some >>>>>>> patches. >>>>>>> >> >> There could be some bug/? in memory consumption. >>>>>>> >> >> >>>>>>> >> >> Regards, >>>>>>> >> >> Poornima >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya >>>>>>> >> >> <[email protected]> >>>>>>> >> >> wrote: >>>>>>> >> >>> >>>>>>> >> >>> Hi, >>>>>>> >> >>> >>>>>>> >> >>> $subject is failing on centos regression for most of the >>>>>>> patches with >>>>>>> >> >>> timeout error. >>>>>>> >> >>> >>>>>>> >> >>> 07:32:34 >>>>>>> >> >>> >>>>>>> >> >>> ============================================================ >>>>>>> ==================== >>>>>>> >> >>> 07:32:34 [07:33:05] Running tests in file >>>>>>> >> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t >>>>>>> >> >>> 07:32:34 Timeout set is 300, default 200 >>>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t >>>>>>> timed out >>>>>>> >> >>> after 300 seconds >>>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: >>>>>>> bad status >>>>>>> >> >>> 124 >>>>>>> >> >>> 07:37:34 >>>>>>> >> >>> 07:37:34 ********************************* >>>>>>> >> >>> 07:37:34 * REGRESSION FAILED * >>>>>>> >> >>> 07:37:34 * Retrying failed tests in case * >>>>>>> >> >>> 07:37:34 * we got some spurious failures * >>>>>>> >> >>> 07:37:34 ********************************* >>>>>>> >> >>> 07:37:34 >>>>>>> >> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t >>>>>>> timed out >>>>>>> >> >>> after 300 seconds >>>>>>> >> >>> 07:42:34 End of test ./tests/bugs/core/bug-1432542- >>>>>>> mpx-restart-crash.t >>>>>>> >> >>> 07:42:34 >>>>>>> >> >>> >>>>>>> >> >>> ============================================================ >>>>>>> ==================== >>>>>>> >> >>> >>>>>>> >> >>> Can anyone take a look? >>>>>>> >> >>> >>>>>>> >> >>> Thanks, >>>>>>> >> >>> Karthik >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> _______________________________________________ >>>>>>> >> >>> Gluster-devel mailing list >>>>>>> >> >>> [email protected] >>>>>>> >> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> > >>>>>>> >> > _______________________________________________ >>>>>>> >> > Gluster-infra mailing list >>>>>>> >> > [email protected] >>>>>>> >> > https://lists.gluster.org/mailman/listinfo/gluster-infra >>>>>>> _______________________________________________ >>>>>>> Gluster-devel mailing list >>>>>>> [email protected] >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>>> >>>>>> -- >>>>>> - Atin (atinm) >>>>>> >>>>> _______________________________________________ >>>>> Gluster-infra mailing list >>>>> [email protected] >>>>> https://lists.gluster.org/mailman/listinfo/gluster-infra >>>> >>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> [email protected] >>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>> >>> >>> >>> _______________________________________________ >>> Gluster-infra mailing list >>> [email protected] >>> https://lists.gluster.org/mailman/listinfo/gluster-infra >>> >> >> >> >> -- >> Amar Tumballi (amarts) >> _______________________________________________ >> Gluster-devel mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-devel > > -- Amar Tumballi (amarts)
_______________________________________________ Gluster-infra mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-infra
