> On 25 Feb 2016, at 22:41, Shinobu Kinjo <[email protected]> wrote:
> 
>> Just beware of HBA compatibility, even in passthrough mode some crappy 
>> firmwares can try and be smart about what you can do (LSI-Avago, I'm looking 
>> your way for crippling TRIM, seriously WTH).
> 
> This is very good to know.
> Can anybody elaborate on this a bit more?
> 

To some degree, it's been a while since I investigated this.
For TRIM/discard to work, you need to have
1) working TRIM/discard command on the drive
2) the scsi/libata layer (?) somehow detect how many blocks can be discarded at 
once and what the block size is etc.
those properties are found in /sys/block/xxx/queue/discard_*

3) filesystem that supports discard (and it looks at those discard_* properties 
to determine when/what to discard).
4) there are also flags (hdparm -I shows them) what happens after trim - either 
the data is zeroed or random data is returned (it is possible to TRIM a sector 
and then read the original data - it doesn't actually need to erase anything, 
it simply marks that sector as unused in bitmap and GC does it's magic when it 
feels like it, if ever)

RAID controllers need to have some degree of control over this, because they 
need to be able to compare the drive contents when scrubbing (the same probably 
somehow applies to mdraid) either by maintaining some bitmap of used blocks or 
by trusting the drives to be deterministic. If you discard a sector on a HW 
RAID, both drives need to start returning the same data or scrubbing will fail. 
Some drives guarantee that and some don't.
You either have DRAT - Deterministic Read After Trim (but this only guarantees 
that data don't change, but they can be random)
or you have DZAT - Deterministic read Zero After Trim (subsequent reads only 
return NULLs)
or you can have none of the above (whcih is no big deal, except for RAID).

Even though I don't use LSI HBAs in IR (RAID) mode, the firmware doesn't like 
that my drives don't have DZAT/DRAT (or rather didn't, this doesn't apply to 
the Intels I have now) and crippled the discard_* parameters to try and 
disallow the use of TRIM. And it mostly works because the filesystem doesn't 
have the discard_* parameters it needs for discard to work...
... BUT it doesn't cripple the TRIM command itself so running hdparm 
--trim-sector-ranges still works (lol) and I suppose if those discard_* 
parameters were made read/write (actually I found a patch that does exactly 
that back then) then we could re-enable trim in spite of the firmware nonsense, 
but with modern SSDs it's mostly pointless anyway and LSI sucks, so who cares 
:-)

*
Sorry if I mixed some layers, maybe it's not filesystem that calls discard but 
another layer in kernel, also not sure how exactly discard_* values are 
detected and when etc., but in essence it works like that.

Jan



> Rgds,
> Shinobu
> 
> ----- Original Message -----
> From: "Jan Schermer" <[email protected]>
> To: "Nick Fisk" <[email protected]>
> Cc: "Robert LeBlanc" <[email protected]>, "Shinobu Kinjo" 
> <[email protected]>, [email protected]
> Sent: Thursday, February 25, 2016 11:10:41 PM
> Subject: Re: [ceph-users] List of SSDs
> 
> We are very happy with S3610s in our cluster.
> We had to flash a new firmware because of latency spikes (NCQ-related), but 
> had zero problems after that...
> Just beware of HBA compatibility, even in passthrough mode some crappy 
> firmwares can try and be smart about what you can do (LSI-Avago, I'm looking 
> your way for crippling TRIM, seriously WTH).
> 
> Jan
> 
> 
>> On 25 Feb 2016, at 14:48, Nick Fisk <[email protected]> wrote:
>> 
>> There’s two factors really
>> 
>> 1.       Suitability for use in ceph
>> 2.       Number of people using them
>> 
>> For #1, there are a number of people using various different drives, so lots 
>> of options. The blog articled linked is a good place to start.
>> 
>> For #2 and I think this is quite important. Lots of people use the S3xx’s 
>> intel drives. This means any problems you face will likely have a lot of 
>> input from other people. Also you are less likely to face surprises, as most 
>> usage cases have already been covered. 
>> 
>> From: ceph-users [mailto:[email protected] 
>> <mailto:[email protected]>] On Behalf Of Robert LeBlanc
>> Sent: 25 February 2016 05:56
>> To: Shinobu Kinjo <[email protected] <mailto:[email protected]>>
>> Cc: ceph-users <[email protected] <mailto:[email protected]>>
>> Subject: Re: [ceph-users] List of SSDs
>> 
>> We are moving to the Intel S3610, from our testing it is a good balance 
>> between price, performance and longevity. But as with all things, do your 
>> testing ahead of time. This will be our third model of SSDs for our cluster. 
>> The S3500s didn't have enough life and performance tapers off add it gets 
>> full. The Micron M600s looked good with the Sebastian journal tests, but 
>> once in use for a while go downhill pretty bad. We also tested Micron M500dc 
>> drives and they were on par with the S3610s and are more expensive and are 
>> closer to EoL. The S3700s didn't have quite the same performance as the 
>> S3610s, but they will last forever and are very stable in terms of 
>> performance and have the best power loss protection. 
>> 
>> Short answer is test them for yourself to make sure they will work. You are 
>> pretty safe with the Intel S3xxx drives. The Micron M500dc is also pretty 
>> safe based on my experience. It had also been mentioned that someone has had 
>> good experience with a Samsung DC Pro (has to have both DC and Pro in the 
>> name), but we weren't able to get any quick enough to test so I can't vouch 
>> for them. 
>> 
>> Sent from a mobile device, please excuse any typos.
>> 
>> On Feb 24, 2016 6:37 PM, "Shinobu Kinjo" <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello,
>> 
>> There has been a bunch of discussion about using SSD.
>> Does anyone have any list of SSDs describing which SSD is highly 
>> recommended, which SSD is not.
>> 
>> Rgds,
>> Shinobu
>> _______________________________________________
>> ceph-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://xo4t.mj.am/link/xo4t/x0557gim2n74/1/sbo9Blye4QJEav_CN9NrlA/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to