re: http://www.garlic.com/~lynn/2013o.html#5 Something to Think About - Optimal PDS Blocking
original RAID patent from 1978 was by somebody in the san jose disk group http://en.wikipedia.org/wiki/RAID ... whom I actually worked with some when they let me play disk engineer in bldgs 14&15 http://www.garlic.com/~lynn/subtopic.html#disk was originally added to s/38. s/38 is periodically referred to as vastly simplified Future System implementation. http://www.garlic.com/~lynn/submain.html#futuresys One of the simplifications was dynamic "scatter" block allocation across all disks in the infrastructure ... treating all disks in the configuration as a single resource. as a result the whole infrastructure had to be backed up as single entity and restored as single entity. common failure mode of the period was single disk failure ... for s/38 required stopping the whole system, replacing the failed disk and doing a complete system restore ... which could be a 24hr event. RAID was used to mask single disk failure and the associated major recovery event. majority RAID implementations these days are at the hardware controller level ... so single disk slices wouldn't be available at the software level. RAID has been used for both availability (masking disk failure) and single thread throughput. One of the issues was single thread throughput was degradation for large DBMS infrastructures. More recent RAID options have tried to address both availability as well as multi-thread random access. over the years, part of commoditizing of industry standard disks is driving MTBF from something like 80k hrs to 800k hrs (warrenty costs across enormous number of disks). MTBF has nearly doubled again. the major cloud operators do in-depth studies of the issues involving component failures ... as part of building their own servers (past news that chip manufacturers ship more server chips directly to large cloud operators than to brand name server vendors) and openly publish the information (something analogous to mainframe industry group that published customer EREP information during the heyday of mainframe clone processors). http://en.wikipedia.org/wiki/Hard_disk_drive_failure from above: The mean time between failures (MTBF) of SATA drives is usually specified to be about 1.2 million hours (some drives such as Western Digital Raptor have rated 1.4 million hours MTBF),[17] while SAS/FC drives are rated for upwards of 1.6 million hours.[18] ... and A 2007 study published by Google suggested very little correlation between failure rates and either high temperature or activity level. Indeed, the Google study indicated that "lower temperatures are associated with higher failure rates" ... snip ... 2007 google disk failure report http://storagemojo.com/2007/02/19/googles-disk-failure-experience/ -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
