I have a server that has gone past 20 CPU loads, and it hasn't crashed (RH7.2) I don't think being busy is a problem. It never has been for me, at least...
w is a command that gives similar info to the top piece of top. free gives info about memory usage. You could run either at the command line, and see the output. If you're running 5.2 --> 7, and you're using Samba, one thing I certainly hope is that you've upgraded Samba at least... The restrictions on file sizes may have been uglier on older version of Samba. I started heavily testing samba at about 2.2.2. Even RH7 had an OLD version of Samba when it came out. You'll want at least 2.2.5 if you're running it as a PDC. For adding the cron stuff, you'll need to type "crontab -e" and you'll be facing a vi editor. You can copy almost exactly what I typed into it, and when you exit vi, it will run as a cron job. You should probably specify a full path for the file you're dumping the output into though. Did you set up the server (like I did on my first production box) with a / and a swap partition? (yeah, yeah, laugh your heart out, I'm screwed, I know...) If not, upgrades should be no problem. Kev. ----- Original Message ----- From: "J. Rafael S�nchez" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, November 13, 2002 12:23 PM Subject: Re: (clug-talk) Programmer(s)/User(s) crashing my system. > > ----- Original Message ----- > From: "Kevin Anderson" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, November 12, 2002 7:43 PM > Subject: Re: (clug-talk) Programmer(s)/User(s) crashing my system. > > > > If you haven't already, I'd STRONGLY advise a kernel upgrade. IIRC, RH > 7.0 > > gave you the choice of either a 2.2 or a 2.4 Kernel. The early 2.4 series > > had some REALLY bad Virtual Memory issues. Red Hat is based on the ac > > branch, so it was better than most. Still, I'd look for something after > > about 2.4.15 or so. Gentoo offers up to 2.4.20, so something around 15 > > isn't overly new. > > > > With the system you've got, what does top show for your Free Memory? and > > Swap? Are you just running out? > > 1:18pm up 22:50, 2 users, load average: 1.44, 1.05, 0.97 > 83 processes: 81 sleeping, 2 running, 0 zombie, 0 stopped > CPU0 states: 21.1% user, 8.0% system, 0.0% nice, 70.3% idle > CPU1 states: 17.3% user, 8.1% system, 0.0% nice, 73.4% idle > Mem: 517056K av, 515724K used, 1332K free, 9292K shrd, 205600K > buff > Swap: 1052216K av, 20556K used, 1031660K free 169764K > cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 1136 jason 12 0 75336 71M 2684 R 51.5 14.2 156:34 idl > 2220 root 3 0 1192 1192 944 R 1.5 0.2 0:00 top > > This is a capture of top. As you can tell "idl" is one of the main apps that > run on this system. One license, one user. Load averages are anywhere from > 1 - 3 cpu loads sometimes (by the way, how do you tell what's a too high > load average? - other than system crash) > > This system would access local data storage as well as nfs shares from other > systems. > > > > > If you can BEAT on this system for a while, I'd set up a cronjob that > looks > > like this > > > > w >> test > > free >> test > > > Let me see if I understand this pseudo code. Run a command that concatenate > results to a text file. > Would you tell me what that command is? I know how to use cron, I'm not sure > if know what command I need to perform this task. > > > and then have it run every minuite. It'll be a huge hit on your system, > and > > it'll create a HUGE file fairly quick, but it'll show your system stats > > within 1 minuite of a crash. > > > > When you talk about a 2 Gig size limit, that shouldn't be a factor for > > Linux, at least not if it's a 2.4 kernel. I can't remember what the max > > file size was for the 2.2 kernel series. However, if you have legacy > > Windows clients, Samba sometimes had a limitation of 2 Gigs per file. As > I > > remember it, that restriction existed when Linux mounted a Share on a > legacy > > Windows box across the network, and then tried to copy a file. If Windows > > mounted the Linux box, and performed the copy, then the 2 Gig issue didn't > > exist. Are you using Legacy Clients? Which machine mounts what? > > > Yes, I'm using legacy windows clients (95,98,w2k,nt4) with samba. However, > users don't move the BIG files around to windows systems. They're allowed to > move from one Linux system to another Linux system using samba though. > > Thanks Kev. > > > > Something to start with... > > > > Kev. > > > > > > ----- Original Message ----- > > From: "J. Rafael S�nchez" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Tuesday, November 12, 2002 4:07 PM > > Subject: (clug-talk) Programmer(s)/User(s) crashing my system. > > > > > > > Good day all, > > > I have user(s)/programmer(s)who are crashing one of my servers. > > > > > > Users have access to this RH 7.0 system over Xwin32 using XDMCP. > > > > > > System decription: 512Mg Ram, Dual Pentium III 933, 1GB swap file, 1GHZ > > > Ethernet Card. > > > > > > The crash(es) are so bad that when I go to the machine, I can't even log > > in, > > > no console access whatsoever; to the point that the only option is to > > "push > > > the on/off button". Of course after that I have to do manual e2fsck(s) > on > > > all my 6 180GB hard drives. > > > > > > I have been able to pinpoint that the system crashes is because they are > > > running a home-made program using IDL language over a gui > > interface/program > > > called ENVI. We deal with imagery a lot (huge files and outputs). Some > of > > > these programs have to break-up huge amounts of image-data into pieces, > do > > > some sort of processing on them and stitch them back together. > > > > > > It could have to do with the fact that the program(s) may not be using > the > > > resources efficiently, memory, 32bit file system limits (2GB file size > > > limits), etc, etc. > > > > > > I'd like to help them and myself by finding out what exactly is that > they > > > are doing or not doing. Is there a system utility or OS utility that I > can > > > use to monitor the system. I've used top. I've looked through the log > > files > > > but I cannot seem to find anything important to help me. > > > > > > The last few lines of my /var/log/messages file of today's crash: > > > > > > *** real name replaced by "thishost" > > > > > > Nov 12 14:00:01 thishost CROND[28389]: (root) CMD ( /sbin/rmmod -as) > > > Nov 12 14:01:00 thishost CROND[28391]: (root) CMD (run-parts > > > /etc/cron.hourly) > > > Nov 12 14:10:01 thishost CROND[28402]: (root) CMD ( /sbin/rmmod -as) > > > Nov 12 14:37:12 thishost syslogd 1.3-3: restart. > > > > > > Output of ls of /etc/cron.hourly > > > [root@thishost /etc]# ls -laF cron.hourly/ > > > total 16 > > > drwxr-xr-x 2 root root 4096 Apr 24 2002 ./ > > > drwxr-xr-x 56 root root 4096 Nov 12 15:20 ../ > > > -rwxr-xr-x 1 news news 65 Jul 24 2000 > inn-cron-nntpsend* > > > -rwxr-xr-x 1 news news 68 Jul 24 2000 inn-cron-rnews* > > > > > > Cat of inn-cron-nntpsend > > > [root@thishost /etc]# cat cron.hourly/inn-cron-nntpsend > > > #!/bin/sh > > > /sbin/chkconfig innd && su - news -c /usr/bin/nntpsend > > > > > > > > > Cat of inn-cron-rnews* > > > #!/bin/sh > > > /sbin/chkconfig innd && su - news -c '/usr/bin/rnews -U' > > > > > > > > > Would this be what's crashing my system? > > > > > > Any suggestion would be greatly appreciated. > > > > > > > > > Rafael. > > > > > > > > > +=+=+=+=+=+=+=+=+=+=+=+=+ > > > j.rafael.s�nchez > > > Systems Administrator > > > +=+=+=+=+=+=+=+=+=+=+=+=+ > > > Itres Research Limited > > > www.itres.com > > > Phone: 403.250.9944 > > > Fax: 403.250.9916 > > > +=+=+=+=+=+=+=+=+=+=+=+=+ > > > > > > > > > > > >
