Re: Something to Think About - Optimal PDS Blocking

Anne & Lynn Wheeler Sun, 08 Dec 2013 16:46:51 -0800

re:
http://www.garlic.com/~lynn/2013o.html#5 Something to Think About - Optimal PDS 
Blocking


original RAID patent from 1978 was by somebody in the san jose disk
group 
http://en.wikipedia.org/wiki/RAID

... whom I actually worked with some when they let me play disk engineer
in bldgs 14&15
http://www.garlic.com/~lynn/subtopic.html#disk

was originally added to s/38. s/38 is periodically referred to as vastly
simplified Future System implementation. 
http://www.garlic.com/~lynn/submain.html#futuresys

One of the simplifications was dynamic "scatter" block allocation across
all disks in the infrastructure ... treating all disks in the
configuration as a single resource. as a result the whole infrastructure
had to be backed up as single entity and restored as single entity.
common failure mode of the period was single disk failure ... for s/38
required stopping the whole system, replacing the failed disk and doing
a complete system restore ... which could be a 24hr event. RAID was used
to mask single disk failure and the associated major recovery event.

majority RAID implementations these days are at the hardware controller
level ... so single disk slices wouldn't be available at the software
level.

RAID has been used for both availability (masking disk failure) and
single thread throughput. One of the issues was single thread throughput
was degradation for large DBMS infrastructures. More recent RAID options
have tried to address both availability as well as multi-thread random
access.

over the years, part of commoditizing of industry standard disks is
driving MTBF from something like 80k hrs to 800k hrs (warrenty costs
across enormous number of disks).

MTBF has nearly doubled again. the major cloud operators do in-depth
studies of the issues involving component failures ... as part of
building their own servers (past news that chip manufacturers ship more
server chips directly to large cloud operators than to brand name server
vendors) and openly publish the information (something analogous to
mainframe industry group that published customer EREP information during
the heyday of mainframe clone processors).
http://en.wikipedia.org/wiki/Hard_disk_drive_failure

from above:

The mean time between failures (MTBF) of SATA drives is usually
specified to be about 1.2 million hours (some drives such as Western
Digital Raptor have rated 1.4 million hours MTBF),[17] while SAS/FC
drives are rated for upwards of 1.6 million hours.[18]

... and

A 2007 study published by Google suggested very little correlation
between failure rates and either high temperature or activity
level. Indeed, the Google study indicated that "lower temperatures are
associated with higher failure rates"

... snip ...

2007 google disk failure report
http://storagemojo.com/2007/02/19/googles-disk-failure-experience/

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Something to Think About - Optimal PDS Blocking

Reply via email to