Alan Gilmour wrote:
We tried installing mbmon and lmmon and healthd, but none seem to work.

How so? Do you have an error message to share with us?

Anyone got any suggestions for other things we can try to detect why
the server is failing? or other ways to check things like CPU temp and
memory status?

What is the hardware vendor? Since most of the major players have decent systems management capability and cards for this sort of thing (think RSA for IBM, DRAC for Dell, etc).
If you are using RAID verify the disks are OK (both physical and logical).
Enable full memory check at POST (not "quick")
Try diagnostics such as what comes with UBCD for memory & disk.

Is this system just like any others at your site or a one-off?

Some days it's just not worth chewing through the restraints...

