Hello LOPSA,

We have a number of virtual machines in a CentOS 5.5 box.  The CentOS box
is a dual Xeon system, with 24 gig of memory and about 7.5 terabytes
of hard disk.  Most of the hard disk (/data) is configured as an XFS
filesystem

In each VM, the OS is Ubuntu 8.04, running our software.  The /data
filesystem above is NFS'd to each VM.

Externally, there are a large number of Ubuntu 8.04 clients, connected
to their respective VM via an OpenVPN tunnel.  They are also connected
to the main server via OpenVPN, to their location in /data

Each VM, and the main server, are monitored via Zabbix.

Randomly, about 1-4 times a week, one of the VMs will get locked up.
The only symptoms I can see is that the process count starts climbing
about 1-5 minutes before the machine gets completely hosed.  It
happens in the middle of the night, and during the middle of the day,
so it doesn't appear to be load related.

When the VM dies, I have a Zabbix process which restarts the VM, so
the downtime is only about 1-2 minutes.

I tried putting in a small script, called by cron once a minute, which
would capture the output of "ps -ax" into a file, but that script
stops running when the symptoms start.

Frankly, I'm stumped.  We've tried adding memory to the VM, adding
additional CPUs to the VM, nothing seems to help.

Does any have any ideas or suggestions?  We would even entertain the
idea of someone coming in for a day or so to help figure things out.

Thanks in advance.


JBB
--
Enhancing your business through Technology

Bayer Technology Group         http://www.BayerTechnologyGroup.com
Jonathan Bayer, CEO            mailto:[email protected]
Work: (609) 632-1200           Mobile: (609) 658-9408
292 Evanston Dr.
East Windsor, NJ 08520

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to