Re: "when SSDs are not so solid" or why no TRIM support can be a good thing :)

Nick Holland Wed, 17 Jun 2015 20:41:48 -0700

On 06/17/15 16:30, Mikael wrote:
> 2015-06-18 0:53 GMT+05:30 Theo de Raadt <dera...@cvs.openbsd.org>:
> 
>> > 2) General on SSD: When an SSD starts to shrink because it starts to wear
>> > out, how is this handled and how does this appear to the OS, logs, and
>> > system software?
>>
>> Invisible.  Even when a few drives make it visible in some way, it is
>> highly proprietary.
>>
> 
> What is then proper behavior for a program or system using an SSD, to deal
> with SSD degradation?:


replace drive before it is an issue.

> So say you have a program altering a file's contents all the time, or you
> have file turnover on a system (rm f123; echo importantdata > f124). At
> some point the SSD will shrink and down the line reach zero capacity.

That's not how it works.

The SSD has some number of spare storage blocks.  When it finds a bad
block, it locks out the bad block and swaps in a good block.

Curiously -- this is EXACTLY how modern "spinning rust" hard disks have
worked for about ... 20 years (yeah.  The "pre-modern" disks were more
exciting).  Write, verify, if error on verify, write to another storage
block, remap new block to old logical location.  Nothing new here  (this
is why people say that "heads", "cylinders" and "sectors per track" have
been meaningless for some time).  When the disk runs out of places to
write the good data, it throws a permanent write error back to the OS
and you have a really bad day.  The only difference in this with SSDs is
the amount of storage dedicated to this (be scared?).

Neither SSDs nor magnetic disks "shrink" to the outside world.  The
moment they need a replacement block that doesn't exist, the disk has
lost data for you and you should call it failed...it has not "shrunk".

Now, in both cases, this is assuming the drive fails in the way you
expect -- that the "flaw" will be spotted on immediate read-after-write,
while the data is still in the disk's cache or buffer.  There is more
than one way magnetic disks fail, there's more than one way SSDs fail.
People tend to hyperventilate over the one way and forget all the rest.

Run your SSDs in production servers for two or three years, then swap
them out.  That's about the warranty on the entire box.  The people that
believe in the manufacturer's warranty being the measure of suitability
for production replace their machines then anyway.  Zero your SSDs, give
them to your staff to stick in their laptops or game computers, or use
them for experimentation and dev systems after that.  Don't
hyperventilate over ONE mode of failure, the majority of your SSDs that
fail will probably fail for other reasons.

[snip]

> 3) On OBSD, how would you generally suggest to make a magnet-SSD hybrid
> disk setup where the SSD gives the speed and maget storage security?

Hybrid disks are a specific thing (or a few specific things) -- magnetic
disks with an SSD cache or magnetic/SSD combos where the first X% of the
disk is SSD, the rest is magnetic (or vise-versa, I guess, but I don't
recall having seen that).  SSD cache, you use like any other disk.
Split mode, you use as multiple partitions, as appropriate.

You clarified this to being about a totally different thing...mirroring
an SSD with a Rotating Rust disk.  At this point, most RAID systems I've
seen do not support a "preferred read" device.  Maybe they should start
thinking about that.  Maybe they shouldn't -- most applications that
NEED the SSD performance for something other than single user jollies
(i.e., a database server vs. having your laptop boot faster) will
face-plant severely should performance suddenly drop by an order of
magnitude.  In many of these cases, the performance drops to the point
that the system death-spirals as queries come in faster than they are
answered.  (this is why when you have an imbalanced redundant pair of
machines, the faster machine should always be the standby machine, not
the primary.  Sometimes "Does the same job just slower" is still quite
effectively "down").

Nick.

Re: "when SSDs are not so solid" or why no TRIM support can be a good thing :)

Reply via email to