Not amusing that it's been closed because it was reported against a particular version of Gluster. Could somebody re-open it, please?
On Wed, Aug 15, 2018 at 10:38 PM Nigel Babu <[email protected]> wrote: > On Wed, Aug 15, 2018 at 2:41 PM Michael Scherer <[email protected]> > wrote: > >> Hi folks, >> >> So Gluster jenkins disk was full today (cause outages do not respect >> public holiday in India (Independance day) and France(Assumption)), >> here is the post mortem for your reading pleasure >> >> Date: 15/08/2018 >> >> Service affected: >> Jenkins for Gluster (jenkins-el7.rht.gluster.org) >> >> Impact: >> >> No jenkins job could be triggered. >> >> Root cause: >> >> A disk full mainly because we got new jobs and more patches, so >> regular growth. >> >> Resolution: >> >> Increased the disk by 30G, and investigating if cleanup could be >> improved. This did require a reboot. >> >> >> Involved people: >> - misc >> - nigel >> >> Lessons learned >> - What went well: >> - we had a documented process for that, and good enough to be used by >> a tired admin. >> >> - What went bad: >> - we weren't proactive enough to see that before it caused a outage >> - 15 of August is a holiday for both France and India. Technically, >> none of the infra team should have been up. >> >> - When we were lucky >> - It was a day off in India, so few people were affected, except >> folks who continue to work on days off >> - Misc decided to go to work while being in Brno to take days off >> later >> >> >> Timeline (in UTC) >> >> - 05:58 Amar post a mail to say "smoke job fail" on gluster-infra: >> https://lists.gluster.org/pipermail/gluster-infra/2018-August/004795.ht >> ml >> <https://lists.gluster.org/pipermail/gluster-infra/2018-August/004795.html> >> >> - 06:23 Nigel ping Misc on Telegram to deal with it, since Nigel is >> away from laptop for Independence day celebration. >> >> - 06:24 Misc do not hear the ding since he is asleep >> >> - 06:55 Sankarshan open a bug on it, https://bugzilla.redhat.com/show_b >> ug.cgi?id=1616160 <https://bugzilla.redhat.com/show_bug.cgi?id=1616160> >> >> - 06:56 Misc do not see the email since he is still asleep >> >> - 07:13 Misc wake up, see a blinking light on the phone and ponder >> about closing his eyes again. He look at it, and start to swear. >> >> - 07:14 Investigation reveal that Jenkins partition is full (100%). A >> quick investigation do not yield any particular issues. The Jenkins >> jobs are taking space and that's it. >> >> - 07:19 After discussion with Nigel, it is decided to increase the size >> of the partition. Misc take a look at it, try to increase without any >> luck. The server is rebooted in case that's what was needed. Still not >> enough. >> >> - 07:25 Misc go quickly shower to wake him up. The warm embrace of >> water make him remember that a documentation on that process do exist: >> >> https://gluster-infra-docs.readthedocs.io/procedures/resize_vm_partitio >> n.html >> <https://gluster-infra-docs.readthedocs.io/procedures/resize_vm_partition.html> >> >> - 07:30 Following the documentation, we discover that the hypervisor >> is now out of space for future increase. Looking at that will be done >> after the post mortem. >> >> - 07:37 Jenkins is being restarted, with more space, and seems to work >> ok. >> >> - 07:38 Misc rush to his hotel breakfast who close at 10. >> >> - 09:09 Post mortem is finished and being sent >> >> >> Action items: >> - (misc) see what can be done for myrmicinae (the hypervisor where >> jenkins is running) since there is no more space. >> >> Potential improvement to make: >> - we still need to have monitoring in place >> - we need to move munin in the internal lan for looking at the graph >> for jenkins >> - documentation regarding resizing could be clearer, notably on volume >> resizing part >> > > This is highlighting that we need to solve > https://bugzilla.redhat.com/show_bug.cgi?id=1564372 on priority. The lack > of monitoring is affecting day to day work. > > -- > nigelb >
_______________________________________________ Gluster-infra mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-infra
