On Thu, Jan 30, 2014 at 09:34:28AM -0500, Mike Julian wrote: > We constantly find memory, network, and hard drive failures very early on > after turning over to a customer. Sometimes as soon as we start to deploy > an OS/software to them. It happens *quite* often.
TL;DR: As you mentioned, changing suppliers is a great option. I spent years in reliability engineering. My hunch is ESD damage, which often causes latent defects in addition to outright dead parts. Test your own ESD equipment. I recently found a "grounded floor" where the ground lead went no further than a coil of wire in the ceiling. The most effective fix is to insist that each part of the supply chain take responsibility. Who tested it? What did they test? Where are the test logs? Was it unsealed when they handled it? Was it handled in an ESD-protected shop? How was it stored? Make them refer you to the previous stop in the supply chain. On-site inspection of assembly facilities. You're not done until the manufacturer of the failed part sends a lab report with explanation of what failed and why. I've uncovered issues such as night-shift employees who moved parts to the tested bin without any test at all (fired), to circuit design issues (setup time violations in disk controller), ESD handing issues, 5-foot drops during shipping, lightning hits in service, you name it. A less effective (but more likely to be practical) method is to attempt to "test quality in" once you receive the product. -> You can't test quality into a product <-, but you can weed out some detectable failures. In no particular order, you might try: - Run continuous full kernel compiles (easy to set up, very likely to catch CPU/RAM issues. 24 hours should catch anything you're likely to catch using this method. - Run the drive conveyance test, once, on each hard drive (man smartctl). - Run hard drive "long" tests on each hard drive, for about 24 hours (man smartctl). - Run random R/W tests (lots of seeks) on the hard drives. - Get a variac, run kernel compiles at lowest and highest spec'd line voltages. - Find a way to run kernel compiles at the high-end of the spec'd ambient temperature. Good luck, -- Charles _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
