If you haven't already, I'd STRONGLY advise a kernel upgrade.  IIRC, RH 7.0
gave you the choice of either a 2.2 or a 2.4 Kernel.  The early 2.4 series
had some REALLY bad Virtual Memory issues.  Red Hat is based on the ac
branch, so it was better than most.  Still, I'd look for something after
about 2.4.15 or so.  Gentoo offers up to 2.4.20, so something around 15
isn't overly new.

With the system you've got, what does top show for your Free Memory?  and
Swap?  Are you just running out?

If you can BEAT on this system for a while, I'd set up a cronjob that looks
like this

w >> test
free >> test

and then have it run every minuite.  It'll be a huge hit on your system, and
it'll create a HUGE file fairly quick, but it'll show your system stats
within 1 minuite of a crash.

When you talk about a 2 Gig size limit, that shouldn't be a factor for
Linux, at least not if it's a 2.4 kernel.  I can't remember what the max
file size was for the 2.2 kernel series.  However, if you have legacy
Windows clients, Samba sometimes had a limitation of 2 Gigs per file.  As I
remember it, that restriction existed when Linux mounted a Share on a legacy
Windows box across the network, and then tried to copy a file.  If Windows
mounted the Linux box, and performed the copy, then the 2 Gig issue didn't
exist.  Are you using Legacy Clients?  Which machine mounts what?

Something to start with...

Kev.


----- Original Message -----
From: "J. Rafael S�nchez" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 12, 2002 4:07 PM
Subject: (clug-talk) Programmer(s)/User(s) crashing my system.


> Good day all,
> I have user(s)/programmer(s)who are crashing one of my servers.
>
> Users have access to this RH 7.0 system over Xwin32 using XDMCP.
>
> System decription: 512Mg Ram, Dual Pentium III 933, 1GB swap file, 1GHZ
> Ethernet Card.
>
> The crash(es) are so bad that when I go to the machine, I can't even log
in,
> no console access whatsoever; to the point that the only option is to
"push
> the on/off button". Of course after that I have to do manual e2fsck(s) on
> all my 6 180GB hard drives.
>
> I have been able to pinpoint that the system crashes is because they are
> running a home-made program using IDL language over a gui
interface/program
> called ENVI. We deal with imagery a lot (huge files and outputs). Some of
> these programs have to break-up huge amounts of image-data into pieces, do
> some sort of processing on them and stitch them back together.
>
> It could have to do with the fact that the program(s) may not be using the
> resources efficiently, memory, 32bit file system limits (2GB file size
> limits), etc, etc.
>
> I'd like to help them and myself by finding out what exactly is that they
> are doing or not doing. Is there a system utility or OS utility that I can
> use to monitor the system. I've used top. I've looked through the log
files
> but I cannot seem to find anything important to help me.
>
> The last few lines of my /var/log/messages file of today's crash:
>
> *** real name replaced by "thishost"
>
> Nov 12 14:00:01 thishost CROND[28389]: (root) CMD (   /sbin/rmmod -as)
> Nov 12 14:01:00 thishost CROND[28391]: (root) CMD (run-parts
> /etc/cron.hourly)
> Nov 12 14:10:01 thishost CROND[28402]: (root) CMD (   /sbin/rmmod -as)
> Nov 12 14:37:12 thishost syslogd 1.3-3: restart.
>
> Output of ls of /etc/cron.hourly
> [root@thishost /etc]# ls -laF cron.hourly/
> total 16
> drwxr-xr-x    2 root     root         4096 Apr 24  2002 ./
> drwxr-xr-x   56 root     root         4096 Nov 12 15:20 ../
> -rwxr-xr-x    1 news     news           65 Jul 24  2000 inn-cron-nntpsend*
> -rwxr-xr-x    1 news     news           68 Jul 24  2000 inn-cron-rnews*
>
> Cat of inn-cron-nntpsend
> [root@thishost /etc]# cat cron.hourly/inn-cron-nntpsend
> #!/bin/sh
> /sbin/chkconfig innd && su - news -c /usr/bin/nntpsend
>
>
> Cat of inn-cron-rnews*
> #!/bin/sh
> /sbin/chkconfig innd && su - news -c '/usr/bin/rnews -U'
>
>
> Would this be what's crashing my system?
>
> Any suggestion would be greatly appreciated.
>
>
> Rafael.
>
>
> +=+=+=+=+=+=+=+=+=+=+=+=+
> j.rafael.s�nchez
> Systems Administrator
> +=+=+=+=+=+=+=+=+=+=+=+=+
> Itres Research Limited
> www.itres.com
> Phone: 403.250.9944
> Fax:   403.250.9916
> +=+=+=+=+=+=+=+=+=+=+=+=+
>
>
>

Reply via email to