Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

Andrija Panic Thu, 17 Sep 2015 16:10:32 -0700

"      came to the conclusion they we put to an "unintended use".   "
wtf ? :)))) Best to install them inside shutdown workstation... :)


On 18 September 2015 at 01:04, Quentin Hartman <[email protected]
> wrote:

> I ended up having 7 total die. 5 while in service, 2 more when I hooked
> them up to a test machine to collect information from them. To Samsung's
> credit, they've been great to deal with and are replacing the failed
> drives, on the condition that I don't use them for ceph again. Apparently
> they sent some of my failed drives to an engineer in Korea and they did a
> failure analysis on them and came to the conclusion they we put to an
> "unintended use". I have seven left I'm not sure what to do with.
>
> I've honestly always really liked Samsung, and I'm disappointed that I
> wasn't able to find anyone with their DC-class drives actually in stock so
> I ended up switching the to Intel S3700s. My users will be happy to have
> some SSDs to put in their workstations though!
>
> QH
>
> On Thu, Sep 17, 2015 at 4:49 PM, Andrija Panic <[email protected]>
> wrote:
>
>> Another one bites the dust...
>>
>> This is Samsung 850 PRO 256GB... (6 journals on this SSDs just died...)
>>
>> [root@cs23 ~]# smartctl -a /dev/sda
>> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.66-1.el6.elrepo.x86_64]
>> (local build)
>> Copyright (C) 2002-12 by Bruce Allen,
>> http://smartmontools.sourceforge.net
>>
>> Vendor:               /1:0:0:0
>> Product:
>> User Capacity:        600,332,565,813,390,450 bytes [600 PB]
>> Logical block size:   774843950 bytes
>> >> Terminate command early due to bad response to IEC mode page
>> A mandatory SMART command failed: exiting. To continue, add one or more
>> '-T permissive' options
>>
>> On 8 September 2015 at 18:01, Quentin Hartman <
>> [email protected]> wrote:
>>
>>> On Tue, Sep 8, 2015 at 9:05 AM, Mark Nelson <[email protected]> wrote:
>>>
>>>> A list of hardware that is known to work well would be incredibly
>>>>> valuable to people getting started. It doesn't have to be exhaustive,
>>>>> nor does it have to provide all the guidance someone could want. A
>>>>> simple "these things have worked for others" would be sufficient. If
>>>>> nothing else, it will help people justify more expensive gear when
>>>>> their
>>>>> approval people say "X seems just as good and is cheaper, why can't we
>>>>> get that?".
>>>>>
>>>>
>>>> So I have my opinions on different drives, but I think we do need to be
>>>> really careful not to appear to endorse or pick on specific vendors. The
>>>> more we can stick to high-level statements like:
>>>>
>>>> - Drives should have high write endurance
>>>> - Drives should perform well with O_DSYNC writes
>>>> - Drives should support power loss protection for data in motion
>>>>
>>>> The better I think.  Once those are established, I think it's
>>>> reasonable to point out that certain drives meet (or do not meet) those
>>>> criteria and get feedback from the community as to whether or not vendor's
>>>> marketing actually reflects reality.  It'd also be really nice to see more
>>>> information available like the actual hardware (capacitors, flash cells,
>>>> etc) used in the drives.  I've had to show photos of the innards of
>>>> specific drives to vendors to get them to give me accurate information
>>>> regarding certain drive capabilities.  Having a database of such things
>>>> available to the community would be really helpful.
>>>>
>>>>
>>> That's probably a very good approach. I think it would be pretty simple
>>> to avoid the appearance of endorsement if the data is presented correctly.
>>>
>>>
>>>>
>>>>> To that point, I think perhaps though something more important than a
>>>>> list of known "good" hardware would be a list of known "bad" hardware,
>>>>>
>>>>
>>>> I'm rather hesitant to do this unless it's been specifically confirmed
>>>> by the vendor.  It's too easy to point fingers (see the recent kernel trim
>>>> bug situation).
>>>
>>>
>>> I disagree. I think that only comes into play if you claim to know why
>>> the hardware has problems. In this case, if you simply state "people who
>>> have used this drive have experienced a large number of seemingly premature
>>> failures when using them as journals" that provides sufficient warning to
>>> users, and if the vendor wants to engage the community and potentially pin
>>> down why and help us find a way to make the device work or confirm that
>>> it's just not suited, then that's on them. Samsung seems to be doing
>>> exactly that. It would be great to have them help provide that level of
>>> detail, but again, I don't think it's necessary. We're not saying
>>> "ceph/redhat/$whatever says this hardware sucks" we're saying "The
>>> community has found that using this hardware with ceph has exhibited these
>>> negative behaviors...". At that point you're just relaying experiences and
>>> collecting them in a central location. It's up to the reader to draw
>>> conclusions from it.
>>>
>>> But again, I think more important than either of these would be a
>>> collection of use cases with actual journal write volumes that have
>>> occurred in those use cases so that people can make more informed
>>> purchasing decisions. The fact that my small openstack cluster created 3.6T
>>> of writes per month on my journal drives (3 OSD each) is somewhat
>>> mind-blowing. That's almost four times the amount of writes my best guess
>>> estimates indicated we'd be doing. Clearly there's more going on than we
>>> are used to paying attention to. Someone coming to ceph and seeing the cost
>>> of DC-class SSDs versus consumer-class SSDs will almost certainly suffer
>>> from some amount of sticker shock, and even if they don't their purchasing
>>> approval people almost certainly will. This is especially true for people
>>> in smaller organizations where SSDs are still somewhat exotic. And when
>>> they come back with the "Why won't cheaper thing X be OK?" they need to
>>> have sufficient information to answer that. Without a test environment to
>>> generate data with, they will need to rely on the experiences of others,
>>> and right now those experiences don't seem to be documented anywhere, and
>>> if they are, they are not very discoverable.
>>>
>>> QH
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>


-- 

Andrija Panić

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

Reply via email to