Hi,

I have a cluster (~120 machines) with torque 2.1.0p0 and maui running very stable for over a year. Last week major power failure happened, so I decided it's very good moment (everything was down) to update os Gentoo Linux). Configuration was also extensively changed, including maui's one. After that maui crashes constantly with such messages:

*** glibc detected *** malloc(): memory corruption: 0x000000000273b180 ***
*** glibc detected *** free(): invalid next size (normal): 0x000000000273b160 ***

and so on. Sometimes it runs for several hours, sometimes for 2-3 scheduling cycles only. My first guess was that glibc is buggy, so I verified that it crashes the same way for glibc 2.3.5-r3 (the one which was installled before upgrade - now it crashes), 2.3.6-r3 and 2.3.6-r4 (the latest one available in gentoo). Also two versions of maui were checked - 3.2.6p16 and maui snapshot dated 23-May-2006 16:20. I am unable to get rid of this problem. It is maui specific for sure - maui is the only one to crash that way, everything else works fine all the time as it was.

If anyone decides it is important to get rid of this "undocumented feature", all diagnostics data can be sent by me (config files, files created by maui, core cumps etc.).

        Marcin Mogielnicki
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to