----- Original Message -----
From: "Kevin Anderson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 12, 2002 7:43 PM
Subject: Re: (clug-talk) Programmer(s)/User(s) crashing my system.


> If you haven't already, I'd STRONGLY advise a kernel upgrade.  IIRC, RH
7.0
> gave you the choice of either a 2.2 or a 2.4 Kernel.  The early 2.4 series
> had some REALLY bad Virtual Memory issues.  Red Hat is based on the ac
> branch, so it was better than most.  Still, I'd look for something after
> about 2.4.15 or so.  Gentoo offers up to 2.4.20, so something around 15
> isn't overly new.
>
> With the system you've got, what does top show for your Free Memory?  and
> Swap?  Are you just running out?

 1:18pm  up 22:50,  2 users,  load average: 1.44, 1.05, 0.97
83 processes: 81 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 21.1% user,  8.0% system,  0.0% nice, 70.3% idle
CPU1 states: 17.3% user,  8.1% system,  0.0% nice, 73.4% idle
Mem:   517056K av,  515724K used,    1332K free,    9292K shrd,  205600K
buff
Swap: 1052216K av,   20556K used, 1031660K free                  169764K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1136 jason     12   0 75336  71M  2684 R    51.5 14.2 156:34 idl
 2220 root       3   0  1192 1192   944 R     1.5  0.2   0:00 top

This is a capture of top. As you can tell "idl" is one of the main apps that
run on this system. One license, one user. Load averages are anywhere from
1 - 3 cpu loads sometimes (by the way, how do you tell what's a too high
load average? - other than system crash)

This system would access local data storage as well as nfs shares from other
systems.

>
> If you can BEAT on this system for a while, I'd set up a cronjob that
looks
> like this
>
> w >> test
> free >> test
>
Let me see if I understand this pseudo code. Run a command that concatenate
results to a text file.
Would you tell me what that command is? I know how to use cron, I'm not sure
if know what command I need to perform this task.

> and then have it run every minuite.  It'll be a huge hit on your system,
and
> it'll create a HUGE file fairly quick, but it'll show your system stats
> within 1 minuite of a crash.
>
> When you talk about a 2 Gig size limit, that shouldn't be a factor for
> Linux, at least not if it's a 2.4 kernel.  I can't remember what the max
> file size was for the 2.2 kernel series.  However, if you have legacy
> Windows clients, Samba sometimes had a limitation of 2 Gigs per file.  As
I
> remember it, that restriction existed when Linux mounted a Share on a
legacy
> Windows box across the network, and then tried to copy a file.  If Windows
> mounted the Linux box, and performed the copy, then the 2 Gig issue didn't
> exist.  Are you using Legacy Clients?  Which machine mounts what?
>
Yes, I'm using legacy windows clients (95,98,w2k,nt4) with samba. However,
users don't move the BIG files around to windows systems. They're allowed to
move from one Linux system to another Linux system using samba though.

Thanks Kev.


> Something to start with...
>
> Kev.
>
>
> ----- Original Message -----
> From: "J. Rafael S�nchez" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, November 12, 2002 4:07 PM
> Subject: (clug-talk) Programmer(s)/User(s) crashing my system.
>
>
> > Good day all,
> > I have user(s)/programmer(s)who are crashing one of my servers.
> >
> > Users have access to this RH 7.0 system over Xwin32 using XDMCP.
> >
> > System decription: 512Mg Ram, Dual Pentium III 933, 1GB swap file, 1GHZ
> > Ethernet Card.
> >
> > The crash(es) are so bad that when I go to the machine, I can't even log
> in,
> > no console access whatsoever; to the point that the only option is to
> "push
> > the on/off button". Of course after that I have to do manual e2fsck(s)
on
> > all my 6 180GB hard drives.
> >
> > I have been able to pinpoint that the system crashes is because they are
> > running a home-made program using IDL language over a gui
> interface/program
> > called ENVI. We deal with imagery a lot (huge files and outputs). Some
of
> > these programs have to break-up huge amounts of image-data into pieces,
do
> > some sort of processing on them and stitch them back together.
> >
> > It could have to do with the fact that the program(s) may not be using
the
> > resources efficiently, memory, 32bit file system limits (2GB file size
> > limits), etc, etc.
> >
> > I'd like to help them and myself by finding out what exactly is that
they
> > are doing or not doing. Is there a system utility or OS utility that I
can
> > use to monitor the system. I've used top. I've looked through the log
> files
> > but I cannot seem to find anything important to help me.
> >
> > The last few lines of my /var/log/messages file of today's crash:
> >
> > *** real name replaced by "thishost"
> >
> > Nov 12 14:00:01 thishost CROND[28389]: (root) CMD (   /sbin/rmmod -as)
> > Nov 12 14:01:00 thishost CROND[28391]: (root) CMD (run-parts
> > /etc/cron.hourly)
> > Nov 12 14:10:01 thishost CROND[28402]: (root) CMD (   /sbin/rmmod -as)
> > Nov 12 14:37:12 thishost syslogd 1.3-3: restart.
> >
> > Output of ls of /etc/cron.hourly
> > [root@thishost /etc]# ls -laF cron.hourly/
> > total 16
> > drwxr-xr-x    2 root     root         4096 Apr 24  2002 ./
> > drwxr-xr-x   56 root     root         4096 Nov 12 15:20 ../
> > -rwxr-xr-x    1 news     news           65 Jul 24  2000
inn-cron-nntpsend*
> > -rwxr-xr-x    1 news     news           68 Jul 24  2000 inn-cron-rnews*
> >
> > Cat of inn-cron-nntpsend
> > [root@thishost /etc]# cat cron.hourly/inn-cron-nntpsend
> > #!/bin/sh
> > /sbin/chkconfig innd && su - news -c /usr/bin/nntpsend
> >
> >
> > Cat of inn-cron-rnews*
> > #!/bin/sh
> > /sbin/chkconfig innd && su - news -c '/usr/bin/rnews -U'
> >
> >
> > Would this be what's crashing my system?
> >
> > Any suggestion would be greatly appreciated.
> >
> >
> > Rafael.
> >
> >
> > +=+=+=+=+=+=+=+=+=+=+=+=+
> > j.rafael.s�nchez
> > Systems Administrator
> > +=+=+=+=+=+=+=+=+=+=+=+=+
> > Itres Research Limited
> > www.itres.com
> > Phone: 403.250.9944
> > Fax:   403.250.9916
> > +=+=+=+=+=+=+=+=+=+=+=+=+
> >
> >
> >

Reply via email to