On Fri, Jul 3, 2009 at 12:17 AM, Chris Samuel<[email protected]> wrote: >> Well we've been gradually replacing the Barcelona chips >> with Shanghai (same clockspeed) and we are yet to see a >> power off on a Shanghai node! > > Since I wrote that we have seen far fewer with 2.3GHz > Shanghai (2376, a 75W part), *but* we have some nodes > upgraded to the ULP 2.4 GHz Shanghai (2379 HE, a 55W > part) which do exhibit this issue very regularly! :-(
We saw a similar power-off issue on a customer of ours who upgraded from 2220's to Barcelona's on a similar board; it was reproducible at the same failure rate on approximately 160 nodes. After trying just about everything under the sun, we wholesale replaced all the memory in the entire cluster. The power-offs ceased immediately thereafter and have not returned. -- Jason D. Clinton, 913-643-0306, http://twitter.com/HPCClusterTech http://www.google.com/profiles/jasondclinton _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
