----- "Chris Samuel" <[email protected]> wrote: In April I wrote:
> Well we've been gradually replacing the Barcelona chips > with Shanghai (same clockspeed) and we are yet to see a > power off on a Shanghai node! Since I wrote that we have seen far fewer with 2.3GHz Shanghai (2376, a 75W part), *but* we have some nodes upgraded to the ULP 2.4 GHz Shanghai (2379 HE, a 55W part) which do exhibit this issue very regularly! :-( Gaussian is still a classic for doing this, but we've also been able to trigger it with VASP, Amber and (far less frequently) InterProScan. The compute nodes are using SuperMicro H8DM8-2 based with 32GB of ECC RAM. The boxes are running CentOS 5.3 with mainline kernels (currently 2.6.28.9, though we have demonstrated it with 2.6.30-rc6 and the EDAC patches which catch nothing before it dies). We've seen the same behaviour with the standard CentOS kernels too. This is driving us up the wall! Is nobody else seeing this ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
