Re: SSD vs. HDD

Jonathan Shook Wed, 03 Nov 2010 16:24:55 -0700

Ah. Point taken on the random access SSD performance. I was trying to
emphasize the relative failure rates given the two scenarios. I didn't
mean to imply that SSD random access performance was not a likely
improvement here, just that it was a complicated trade-off in the
grand scheme of things.. Thanks for catching my goof.



On Wed, Nov 3, 2010 at 3:58 PM, Tyler Hobbs <ty...@riptano.com> wrote:
> SSD will not generally improve your write performance very much, but they
> can significantly improve read performance.
>
> You do *not* want to waste an SSD on the commitlog drive, as even a slow HDD
> can write sequentially very quickly.  For the data drive, they might make
> sense.
>
> As Jonathan talks about, it has a lot to do with your access patterns.  If
> you either: (1) delete parts of rows (2) update parts of rows, or (3) insert
> new columns into existing rows frequently, you'll end up with rows spread
> across several SSTables (which are on disk).  This means that each read may
> require several seeks, which are very slow for HDDs, but are very quick for
> SSDs.
>
> Of course, the randomness of what rows you access is also important, but
> Jonathan did a good job of covering that.  Don't forget about the effects of
> caching here, too.
>
> The only way to tell if it is cost-effective is to test your particular
> access patterns (using a configured stress.py test or, preferably, your
> actual application).
>
> - Tyler
>
> On Wed, Nov 3, 2010 at 3:44 PM, Jonathan Shook <jsh...@gmail.com> wrote:
>>
>> SSDs are not reliable after a (relatively-low compared to spinning
>> disk) number of writes.
>> They may significantly boost performance if used on the "journal"
>> storage, but will suffer short lifetimes for highly-random write
>> patterns.
>>
>> In general, plan to replace them frequently. Whether they are worth
>> it, given the performance improvement over the cost of replacement x
>> hardware x logistics is generally a calculus problem. It's difficult
>> to make a generic rationale for or against them.
>>
>> You might be better off in general by throwing more memory at your
>> servers, and isolating your random access from your journaled data.
>> Is there any pattern to your reads and writes/deletes? If it is fully
>> random across your keys, then you have the worst-case scenario.
>> Sometimes you can impose access patterns or structural patterns in
>> your app which make caching more effective.
>>
>> Good questions to ask about your data access:
>> Is there a "user session" which shows an access pattern to proximal data?
>> Are there sets of access which always happen close together?
>> Are there keys or maps which add extra indirection?
>>
>> I'm not familiar with your situation. I was just providing some general
>> ideas..
>>
>> Jonathan Shook
>>
>> On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi <alaa.zuba...@pdf.com> wrote:
>> > Hi,
>> > we have a continuous high throughput writes, read and delete, and we are
>> > trying to find the best hardware.
>> > Is using SSD for Cassandra improves performance? Did any one compare SSD
>> > vs.
>> > HDD? and any recommendations on SSDs?
>> >
>> > Thanks,
>> > Alaa
>> >
>> >
>
>

Re: SSD vs. HDD

Reply via email to