On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP]
> Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. And many times she was on a node with only one other > user who we know doesn’t access the GPFS filesystem and other times > she was the only user on the node. > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
