On 4/8/14, 11:05 AM, Michael Di Domenico wrote:
On Tue, Apr 8, 2014 at 10:57 AM, Joe Landman
<[email protected]> wrote:
 From a general purpose point of view, Intel and Samsung make great lower end
devices.  SanDisk makes great higher end devices.  We are working on getting
some Toshiba's and a few others for enterprise to ultra-high-end testing.

With some of the SSDs, we found that a hot plug event was permanently
terminal to the device.  Neat, huh?  Other SSDs we played with had 40+%
failure rates.
is that 40% infant mortality or after some period of time?

That was 40% across a very large swath of parts, within a 2 week window of each other, for lightly used boot drive SSDs. We ripped them out, globally, and replaced them. Including non-failed parts.


i've held off on ssd's in our environment mostly because of the
general feeling that ssd's still have a much shorter life expectancy
then hdd's.  some anecdotal evidence would be helpful.
The cheap drives are crap. The good drives will cost you. The good drives will be as reliable as spinning rust, if not more so. The meh drives have 2-5 random drive writes per day (DWPD) over a 5 year window. The crappy drives have sub 1 (usually sub 0.1). The good drives have 10+ DWPD.

Huge hint: if they don't give explicit figures on durability, there is a very good reason for that.

Huge hint 2: You can take the analysis Prentiss suggested to calculate the number of single block erasures that the drive can tolerate during its lifetime. Crap drives are way sub 3k. Meh drives are 3k-7k (nothing important on them, avoid them in write amplified ... RAID5/6 ... scenarios). Good drives are 10+k erasures.

For 1PB of total writes during lifetime, a 100GB drive would be written 10k times. If this is over 5 years (call it 1825 days), then you get roughly 10k/1825 -> 5.5 DWPD. Upper end of meh into "lower good" range. This is 10k erasure/rewrite cycles.

Note that this analysis is *highly* oversimplified, and a good academic would take strong issue with it. But it also appears to match reality quite well from what we observe.

Our high end SSDs in our siflash box have a lower average yearly failure rate than our high end spinning rust drives.

Good SSDs will cost you more than the crap ones. But you will not regret buying the good ones. You will regret buying the crap ones.

Just remember this if you are specing out a new storage box/cluster/computing system, that you need to make engineering and cost tradeoffs. And in the ultracompetitive academic cluster market, it just may be that the margins are so incredibly thin to begin with, that anything that helps increase the margin is a good thing for the company offering the system. I know people here may not be sympathetic to this viewpoint, and thats OK. Until, that is, you are on the other side, trying to pay your team with the slivers of margins you make on these sales. I'd recommend, instead of automatically picking the cheapest (acquisition) cost item, that you focus upon the best. The latter will cost you more and you will have less headache.


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: [email protected]
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to