Ah. Point taken on the random access SSD performance. I was trying to emphasize the relative failure rates given the two scenarios. I didn't mean to imply that SSD random access performance was not a likely improvement here, just that it was a complicated trade-off in the grand scheme of things.. Thanks for catching my goof.
On Wed, Nov 3, 2010 at 3:58 PM, Tyler Hobbs <ty...@riptano.com> wrote: > SSD will not generally improve your write performance very much, but they > can significantly improve read performance. > > You do *not* want to waste an SSD on the commitlog drive, as even a slow HDD > can write sequentially very quickly. For the data drive, they might make > sense. > > As Jonathan talks about, it has a lot to do with your access patterns. If > you either: (1) delete parts of rows (2) update parts of rows, or (3) insert > new columns into existing rows frequently, you'll end up with rows spread > across several SSTables (which are on disk). This means that each read may > require several seeks, which are very slow for HDDs, but are very quick for > SSDs. > > Of course, the randomness of what rows you access is also important, but > Jonathan did a good job of covering that. Don't forget about the effects of > caching here, too. > > The only way to tell if it is cost-effective is to test your particular > access patterns (using a configured stress.py test or, preferably, your > actual application). > > - Tyler > > On Wed, Nov 3, 2010 at 3:44 PM, Jonathan Shook <jsh...@gmail.com> wrote: >> >> SSDs are not reliable after a (relatively-low compared to spinning >> disk) number of writes. >> They may significantly boost performance if used on the "journal" >> storage, but will suffer short lifetimes for highly-random write >> patterns. >> >> In general, plan to replace them frequently. Whether they are worth >> it, given the performance improvement over the cost of replacement x >> hardware x logistics is generally a calculus problem. It's difficult >> to make a generic rationale for or against them. >> >> You might be better off in general by throwing more memory at your >> servers, and isolating your random access from your journaled data. >> Is there any pattern to your reads and writes/deletes? If it is fully >> random across your keys, then you have the worst-case scenario. >> Sometimes you can impose access patterns or structural patterns in >> your app which make caching more effective. >> >> Good questions to ask about your data access: >> Is there a "user session" which shows an access pattern to proximal data? >> Are there sets of access which always happen close together? >> Are there keys or maps which add extra indirection? >> >> I'm not familiar with your situation. I was just providing some general >> ideas.. >> >> Jonathan Shook >> >> On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi <alaa.zuba...@pdf.com> wrote: >> > Hi, >> > we have a continuous high throughput writes, read and delete, and we are >> > trying to find the best hardware. >> > Is using SSD for Cassandra improves performance? Did any one compare SSD >> > vs. >> > HDD? and any recommendations on SSDs? >> > >> > Thanks, >> > Alaa >> > >> > > >