The Red Hat versions you are using are very stable. 512MB RAM is lots. Ext3 is a good idea but will not stop the crashes caused by BAD PROGRAMMING. Upgrading the kernel is a good idea also but stick with 2.2.x because moving to 2.4.x will be a big pain. Newer versions of the 2.2.x kernel are great kernels and if they support all of your hardware the benifets are not that great. 2.4.14 or higher would use a lot less RAM and CPU power but your init scripts will all break and you would be forced to do a complete reinstall. Avoid any 2.4.x lower then 2..4.14 they are very buggy, the newer 2.2.x are better. Unless you need files bigger then 2GB.
On Tue, 2002-11-12 at 16:07, J. Rafael Sánchez wrote:
Good day all, I have user(s)/programmer(s)who are crashing one of my servers. Users have access to this RH 7.0 system over Xwin32 using XDMCP. System decription: 512Mg Ram, Dual Pentium III 933, 1GB swap file, 1GHZ Ethernet Card. The crash(es) are so bad that when I go to the machine, I can't even log in, no console access whatsoever; to the point that the only option is to "push the on/off button". Of course after that I have to do manual e2fsck(s) on all my 6 180GB hard drives. I have been able to pinpoint that the system crashes is because they are running a home-made program using IDL language over a gui interface/program called ENVI. We deal with imagery a lot (huge files and outputs). Some of these programs have to break-up huge amounts of image-data into pieces, do some sort of processing on them and stitch them back together. It could have to do with the fact that the program(s) may not be using the resources efficiently, memory, 32bit file system limits (2GB file size limits), etc, etc. I'd like to help them and myself by finding out what exactly is that they are doing or not doing. Is there a system utility or OS utility that I can use to monitor the system. I've used top. I've looked through the log files but I cannot seem to find anything important to help me. The last few lines of my /var/log/messages file of today's crash: *** real name replaced by "thishost" Nov 12 14:00:01 thishost CROND[28389]: (root) CMD ( /sbin/rmmod -as) Nov 12 14:01:00 thishost CROND[28391]: (root) CMD (run-parts /etc/cron.hourly) Nov 12 14:10:01 thishost CROND[28402]: (root) CMD ( /sbin/rmmod -as) Nov 12 14:37:12 thishost syslogd 1.3-3: restart. Output of ls of /etc/cron.hourly [root@thishost /etc]# ls -laF cron.hourly/ total 16 drwxr-xr-x 2 root root 4096 Apr 24 2002 ./ drwxr-xr-x 56 root root 4096 Nov 12 15:20 ../ -rwxr-xr-x 1 news news 65 Jul 24 2000 inn-cron-nntpsend* -rwxr-xr-x 1 news news 68 Jul 24 2000 inn-cron-rnews* Cat of inn-cron-nntpsend [root@thishost /etc]# cat cron.hourly/inn-cron-nntpsend #!/bin/sh /sbin/chkconfig innd && su - news -c /usr/bin/nntpsend Cat of inn-cron-rnews* #!/bin/sh /sbin/chkconfig innd && su - news -c '/usr/bin/rnews -U' Would this be what's crashing my system? Any suggestion would be greatly appreciated. Rafael. +=+=+=+=+=+=+=+=+=+=+=+=+ j.rafael.sánchez Systems Administrator +=+=+=+=+=+=+=+=+=+=+=+=+ Itres Research Limited www.itres.com Phone: 403.250.9944 Fax: 403.250.9916 +=+=+=+=+=+=+=+=+=+=+=+=+
|
Roy Souther www.SiliconTao.com Changing the way people do business. |
signature.asc
Description: This is a digitally signed message part
