Hi,
I have a cluster (~120 machines) with torque 2.1.0p0 and maui running
very stable for over a year. Last week major power failure happened, so
I decided it's very good moment (everything was down) to update os
Gentoo Linux). Configuration was also extensively changed, including
maui's one. After that maui crashes constantly with such messages:
*** glibc detected *** malloc(): memory corruption: 0x000000000273b180 ***
*** glibc detected *** free(): invalid next size (normal):
0x000000000273b160 ***
and so on. Sometimes it runs for several hours, sometimes for 2-3
scheduling cycles only. My first guess was that glibc is buggy, so I
verified that it crashes the same way for glibc 2.3.5-r3 (the one which
was installled before upgrade - now it crashes), 2.3.6-r3 and 2.3.6-r4
(the latest one available in gentoo). Also two versions of maui were
checked - 3.2.6p16 and maui snapshot dated 23-May-2006 16:20. I am
unable to get rid of this problem. It is maui specific for sure - maui
is the only one to crash that way, everything else works fine all the
time as it was.
If anyone decides it is important to get rid of this "undocumented
feature", all diagnostics data can be sent by me (config files, files
created by maui, core cumps etc.).
Marcin Mogielnicki
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers