Dear Jason Williams, thank you for your hint. Please, find below the result of our Maui running with the "-d" command line option (maui was running about 5 minutes before it crashed):
# /usr/local/maui/sbin/maui -d *** glibc detected *** /usr/local/maui/sbin/maui: malloc(): memory corruption: 0x00000000099243e0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3300672fae] /lib64/libc.so.6(__libc_malloc+0x6e)[0x3300674cde] /usr/local/torque/lib/libtorque.so.2(decode_DIS_replyCmd+0x266)[0x2ab278cb18 e6] /usr/local/torque/lib/libtorque.so.2(PBSD_rdrpy+0x80)[0x2ab278cb56d0] /usr/local/torque/lib/libtorque.so.2(PBSD_status_get+0x26)[0x2ab278cb6786] /usr/local/maui/sbin/maui[0x4d9e59] /usr/local/maui/sbin/maui[0x48b8e4] /usr/local/maui/sbin/maui[0x48b84f] /usr/local/maui/sbin/maui[0x4ce81c] /usr/local/maui/sbin/maui[0x4ce39e] /usr/local/maui/sbin/maui[0x4419eb] /usr/local/maui/sbin/maui[0x403608] /lib64/libc.so.6(__libc_start_main+0xf4)[0x330061d994] /usr/local/maui/sbin/maui[0x402cd9] ======= Memory map: ======== 00400000-0054f000 r-xp 00000000 08:03 50266128 /usr/local/maui/sbin/maui 0074f000-00754000 rw-p 0014f000 08:03 50266128 /usr/local/maui/sbin/maui 00754000-02344000 rw-p 00754000 00:00 0 0984b000-188f1000 rw-p 0984b000 00:00 0 [heap] 3300200000-330021c000 r-xp 00000000 08:03 18186265 /lib64/ld-2.5.so 330041b000-330041c000 r--p 0001b000 08:03 18186265 /lib64/ld-2.5.so 330041c000-330041d000 rw-p 0001c000 08:03 18186265 /lib64/ld-2.5.so 3300600000-330074e000 r-xp 00000000 08:03 18186304 /lib64/libc-2.5.so 330074e000-330094d000 ---p 0014e000 08:03 18186304 /lib64/libc-2.5.so 330094d000-3300951000 r--p 0014d000 08:03 18186304 /lib64/libc-2.5.so 3300951000-3300952000 rw-p 00151000 08:03 18186304 /lib64/libc-2.5.so 3300952000-3300957000 rw-p 3300952000 00:00 0 3300a00000-3300a02000 r-xp 00000000 08:03 18186457 /lib64/libdl-2.5.so 3300a02000-3300c02000 ---p 00002000 08:03 18186457 /lib64/libdl-2.5.so 3300c02000-3300c03000 r--p 00002000 08:03 18186457 /lib64/libdl-2.5.so 3300c03000-3300c04000 rw-p 00003000 08:03 18186457 /lib64/libdl-2.5.so 3300e00000-3300e82000 r-xp 00000000 08:03 18186543 /lib64/libm-2.5.so 3300e82000-3301081000 ---p 00082000 08:03 18186543 /lib64/libm-2.5.so 3301081000-3301082000 r--p 00081000 08:03 18186543 /lib64/libm-2.5.so 3301082000-3301083000 rw-p 00082000 08:03 18186543 /lib64/libm-2.5.so 3303a00000-3303a0d000 r-xp 00000000 08:03 18186545 /lib64/libgcc_s-4.1.2-20080825.so.1 3303a0d000-3303c0d000 ---p 0000d000 08:03 18186545 /lib64/libgcc_s-4.1.2-20080825.so.1 3303c0d000-3303c0e000 rw-p 0000d000 08:03 18186545 /lib64/libgcc_s-4.1.2-20080825.so.1 3304a00000-3304a15000 r-xp 00000000 08:03 18186491 /lib64/libselinux.so.1 3304a15000-3304c15000 ---p 00015000 08:03 18186491 /lib64/libselinux.so.1 3304c15000-3304c17000 rw-p 00015000 08:03 18186491 /lib64/libselinux.so.1 3304c17000-3304c18000 rw-p 3304c17000 00:00 0 3304e00000-3304e3b000 r-xp 00000000 08:03 18186479 /lib64/libsepol.so.1 3304e3b000-330503b000 ---p 0003b000 08:03 18186479 /lib64/libsepol.so.1 330503b000-330503c000 rw-p 0003b000 08:03 18186479 /lib64/libsepol.so.1 330503c000-3305046000 rw-p 330503c000 00:00 0 3305e00000-3305e02000 r-xp 00000000 08:03 18186469 /lib64/libkeyutils-1.3.so 3305e02000-3306001000 ---p 00002000 08:03 18186469 /lib64/libkeyutils-1.3.so 3306001000-3306002000 rw-p 00001000 08:03 18186469 /lib64/libkeyutils-1.3.so 3306200000-3306211000 r-xp 00000000 08:03 18186474 /lib64/libresolv-2.5.so 3306211000-3306411000 ---p 00011000 08:03 18186474 /lib64/libresolv-2.5.so 3306411000-3306412000 r--p 00Aborted Thank you for your efforts. Stephan -- --------------------------------------------------------- | | Dr. rer. nat. Stephan Raub | | Dipl. Chem. | | High-Performance-Computing | | Zentrum für Informations- und Medientechnologie | | Heinrich-Heine-Universität Düsseldorf | | Universitätsstr. 1 / Raum 25.41.O2.25-2 | | 40225 Düsseldorf / Germany | | | | Tel: +49-211-811-3911 | | Fax: +49-211-811-2539 --------------------------------------------------------- Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse, bzw. sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank. Important Note: This e-mail may contain trade secrets or privileged, undisclosed or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying or distribution of it is strictly prohibited. Please inform us immediately and destroy the original transmittal. Thank you for your cooperation. > -----Ursprüngliche Nachricht----- > Von: [email protected] [mailto:mauiusers- > [email protected]] Im Auftrag von Jason Williams > Gesendet: Dienstag, 8. November 2011 23:50 > An: [email protected] > Betreff: Re: [Mauiusers] Possible Memory Corruption in maui > > Dr Stephan Raub, > > Maui does have some very odd "memory management" in it that has a > tendency to cause these types of crashes when run in high volume > situations without some tweaks and/or concessions. I've tracked down, > and I think fixed, one in the latest svn trunk, but 3.3.1 should > already have that fix in it. > > Can/have you tried running maui from the command line with the -d line > and catching the corrupt memory and back trace that comes out of it? > Your original email has the strace, but it cuts off some of the > backtrace. I might be able to see where in the code it's having > problems, if I can get the full back trace. > > > -- > Jason Williams > Systems Engineer > Homewood High Performance Cluster > Johns Hopkins University > > On 11/8/2011 12:09 PM, Dr. Stephan Raub wrote: > > Dear Mr. van der Vlies > > > > Currently we have 6095 Jobs queued and 93 Jobs running. Amoung these, > > we have some large job arrays (1000 and 4000 items per array). > > > > Best regards. > > -- > > --------------------------------------------------------- > > | | Dr. rer. nat. Stephan Raub > > | | Dipl. Chem. > > | | High-Performance-Computing > > | | Zentrum für Informations- und Medientechnologie > > | | Heinrich-Heine-Universität Düsseldorf Universitätsstr. 1 / Raum > > | | 25.41.O2.25-2 > > | | 40225 Düsseldorf / Germany > > | | > > | | Tel: +49-211-811-3911 > > | | Fax: +49-211-811-2539 > > --------------------------------------------------------- > > > > Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder > > Geschäftsgeheimnisse, bzw. > > sonstige vertrauliche Informationen enthalten. Sollten Sie diese > > E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des > > Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail > ausdrücklich > > untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die > > empfangene E-Mail. Vielen Dank. > > > > Important Note: This e-mail may contain trade secrets or privileged, > > undisclosed or otherwise confidential information. If you have > > received this e-mail in error, you are hereby notified that any > > review, copying or distribution of it is strictly prohibited. Please > > inform us immediately and destroy the original transmittal. Thank you > for your cooperation. > > > > > >> -----Ursprüngliche Nachricht----- > >> Von: Bas van der Vlies [mailto:[email protected]] > >> Gesendet: Dienstag, 8. November 2011 17:10 > >> An: Dr. Stephan Raub > >> Betreff: Re: [Mauiusers] Possible Memory Corruption in maui > >> > >> On 08-11-11 16:40, Dr. Stephan Raub wrote: > >>> Dear fellow maui users, > >>> > >>> we are running Maui 3.3.1 with torque 2.3.7 under RHEL5.5 > >>> (2.6.8-194.26.1.el1) on a 600-somewhat core cluster. > >>> > >>> We experienced a sudden death of the maui scheduler with no message > >> in the > >>> logs. We could not figure out a reason so we attached an "strace" > to > >> the > >>> maui process (as long as it was "still alive") and we got: > >>> > >> Dear Dr. Stephan Raub, > >> > >> just a question: How many jobs are in the queue? > >> > >> regards > >> > >> > >> -- > >> ******************************************************************** > >> * Bas van der Vlies e-mail: [email protected] * > >> * SARA - Academic Computing Services Amsterdam, The Netherlands * > >> ******************************************************************** > > > > > > _______________________________________________ > > mauiusers mailing list > > [email protected] > > http://www.supercluster.org/mailman/listinfo/mauiusers > > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
