On 19 February 2018 at 18:19, Mohit Agrawal <moagr...@redhat.com> wrote:
> Hi, > > I think I know the reason why tarball size is bigger, could it happen if > tar file has more than one core. > I triggered a build(https://review.gluster.org/19574 to validate all > test cases after enable brick mux) after update "exit_one_failure="no"" in > run-tests.sh > so build has executed all test cases and in the earlier version of the > patch, i was getting multiple cores. > > Now it is generating only one core, it seems other code paths are fixed > so the issue should be resolved now. > > > Good to hear. Thanks Mohit. > Regards > Mohit Agrawal > > On Mon, Feb 19, 2018 at 6:07 PM, Sankarshan Mukhopadhyay < > sankarshan.mukhopadh...@gmail.com> wrote: > >> On Mon, Feb 19, 2018 at 5:58 PM, Nithya Balachandran >> <nbala...@redhat.com> wrote: >> > >> > >> > On 19 February 2018 at 13:12, Atin Mukherjee <amukh...@redhat.com> >> wrote: >> >> >> >> >> >> >> >> On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu <nig...@redhat.com> wrote: >> >>> >> >>> Hello, >> >>> >> >>> As you all most likely know, we store the tarball of the binaries and >> >>> core if there's a core during regression. Occasionally, we've >> introduced a >> >>> bug in Gluster and this tar can take up a lot of space. This has >> happened >> >>> recently with brick multiplex tests. The build-install tar takes up >> 25G, >> >>> causing the machine to run out of space and continuously fail. >> >> >> >> >> >> AFAIK, we don't have a .t file in upstream regression suits where >> hundreds >> >> of volumes are created. With that scale and brick multiplexing >> enabled, I >> >> can understand the core will be quite heavy loaded and may consume up >> to >> >> this much of crazy amount of space. FWIW, can we first try to figure >> out >> >> which test was causing this crash and see if running a gcore after a >> certain >> >> steps in the tests do left us with a similar size of the core file? >> IOW, >> >> have we actually seen such huge size of core file generated earlier? >> If not, >> >> what changed because which we've started seeing this is something to be >> >> invested on. >> > >> > >> > We also need to check if this is only the core file that is causing the >> > increase in size or whether there is something else that is taking up a >> lot >> > of space. >> >> >> >> >> >>> >> >>> >> >>> I've made some changes this morning. Right after we create the >> tarball, >> >>> we'll delete all files in /archive that are greater than 1G. Please >> be aware >> >>> that this means all large files including the newly created tarball >> will be >> >>> deleted. You will have to work with the traceback on the Jenkins job. >> >> >> >> >> >> We'd really need to first investigate on the average size of the core >> file >> >> what we can get with when a system is running with brick multiplexing >> and >> >> ongoing I/O. With out that immediately deleting the core files > 1G >> will >> >> cause trouble to the developers in debugging genuine crashes as >> traceback >> >> alone may not be sufficient. >> >> >> >> I'd like to echo what Nithya writes - instead of treating this >> incident as an outlier, we might want to do further analysis. If this >> has happened on a production system - there would be blood. >> > >
_______________________________________________ Gluster-infra mailing list Gluster-infra@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-infra