Re: [ceph-users] SSD journals killed by VMs generating 500 IOPs (4kB) non-stop for a month, seemingly because of a syslog-ng bug

Mart van Santen Mon, 23 Nov 2015 01:30:07 -0800

Hello,


On 11/22/2015 10:01 PM, Robert LeBlanc wrote:
> There have been numerous on the mailing list of the Samsung EVO and
> Pros failing far before their expected wear. This is most likely due
> to the 'uncommon' workload of Ceph and the controllers of those drives
> are not really designed to handle the continuous direct sync writes
> that Ceph does. Because of this they can fail without warning
> (controller failure rather than MLC failure).

I'm new to the mailinglist and I'm scanning the archive currently.  And
I'm getting a sense of the Samsung Evo quality disks. If i understand
correctly, is is at least advise to put DC grade Journals in front om
them to safe them a bit from failure. For example intel 750's.

However, is there experience in when the Evo's fail in the Ceph
scenarion? For example, is wear leveling is according SMART about 40%,
it's time to replace your disks? Or is it just random. Actually we are
using mostly Crucial drives (m550, mx200's), there is not a lot about
them on the list. Do other people use them and what's there experience
so far. I expect about the same quality of the Samsung Evo's, but I'm
not sure if that is the correct conclusion.

About SSD failure in general, do they normally fail hard, or are they
just getting unbearable slow? We do measure/graph disks 'busy'
performance, and use that as an indicator if a disk is getting slow. Is
this is a sensible approach?

Regards,

Mart



>
> We have tested the performance of the Micron M600 drives and as long
> as you don't fill them up, they perform like the Intel line. I just
> don't know if they will die prematurely like a lot of the Samsungs
> have. We have a load of Intel S3500s that we can put in if they start
> failing so I'm not too worried at the moment.
>
> The only drives that I've heard really good things about are the Intel
> S3700 (and I suspect the S3600 and S3500s could be used as well if you
> take some additional precautions) and the Samsung DC PROs (has to have
> the DC and PRO in the name). The Micron M600s are a good value and
> have decent performance and I plan on keeping the list informed about
> them as time goes on.
>
> With a cluster that is idle as your's it may not make that much of a
> difference. Where we are pushing 1,000s of IOPs all the time, we have
> a challenge if the SSDs can't take the load.
> ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  
> C70E E654
3BB2 FA62 B9F1 > > > On Sun, Nov 22, 2015 at 10:40 AM, Alex Moore
<[email protected]> wrote: >> I just had 2 of the 3 SSD journals in my
small 3-node cluster fail within 24 >> hours of each other (not fun,
although thanks to a replication factor of 3x, >> at least I didn't lose
any data). The journals were 128 GB Samsung 850 Pros. >> However I have
determined that it wasn't really their fault... >> >> This is a small
Ceph cluster running just a handful of relatively idle Qemu >> VMs using
librbd for storage, and I had originally estimated that based on >> my
low expected volume of write IO the Samsung 850 Pro journals would last
>> at least 5 years (which would have been plenty). I still think that
estimate >> was correct, but the reason they died prematurely (in
reality they lasted 15 >> months) seems to have been that a number of my
VMs had been hammering their >> disks continuously for almost a month,
and I only noticed retrospectively >> after the journals had died. I
tracked it back to some sort of bug in >> syslog-ng: the affected VMs
took an update to syslog-ng on October 24th, and >> then ever since the
following daily logrotate early on the 25th, the syslog >> daemons were
together generating about 500 IOPs of 4kB writes continuously >> for the
next 4 weeks until the journals then failed. >> >> As a result, I reckon
that taking write amplification into account the SSDs >> must have each
written just over 1PB over that period - way more than they >> are
supposed to be able to handle - so I can't blame the SSDs. >> >> I do
have graphs tracking various metrics for the Ceph cluster, including >>
IOPs, latency, and read/write throughput - which is how I worked out
what >> happened afterwards - but unfortunately I didn't have any
alerting set up to >> warn me when there were anomalies in the graphs,
and I wasn't proactively >> looking at the graphs on a regular basis. >>
>> So I think there is a lesson to be learned here... even if you have
>> correctly spec'd your SSD journals in terms of endurance for the
anticipated >> level of write activity in a cluster, it's still
important to keep an eye on >> ensuring that the write activity matches
expectations, as it's quite easy >> for a misbehaving VM to severely
drain the life expectancy of SSDs by >> generating 4k write IOs as
quickly as it can for a long period of time! >> >> I have now replaced
all 3 journals with 240 GB Samsung SM863 SSDs, which >> were only about
twice the cost of the smaller 850 Pros. And I'm already >> noticing a
massive performance improvement (reduction in write latency, and >>
higher IOPs). So I'm not too upset about having unnecessarily killed the
850 >> Pros. But I thought it was worth sharing the experience... >> >>
FWIW the OSDs themselves are on 1TB Samsung 840 Evos, which I have been
>> happy with so far (they've been going for about 18 months at this
stage). >> >> Alex >> >> _______________________________________________
>> ceph-users mailing list >> [email protected] >>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ > ceph-users mailing
list > [email protected] >
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Mart van Santen
Greenhost
E: [email protected]
T: +31 20 4890444
W: https://greenhost.nl

A PGP signature can be attached to this e-mail,
you need PGP software to verify it.
My public key is available in keyserver(s)
see: http://tinyurl.com/openpgp-manual

PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5

signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSD journals killed by VMs generating 500 IOPs (4kB) non-stop for a month, seemingly because of a syslog-ng bug

Reply via email to