Hi Matt,

Thanks for the detailed explanation! Yes, this is exactly what I'm looking
for, "write amplification = data written to flash/data written by the host".

We are heavily using the LCS in production, so I'd like to figure out the
amplification caused by that and see what we can do to optimize it. I have
the metrics of "data written to flash", and I'm wondering is there an easy
way to get the "data written by the host" on each C* node?

Thanks

On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkenn...@datastax.com> wrote:

> TL;DR - Cassandra actually causes a ton of write amplification but it
> doesn't freaking matter any more. Read on for details...
>
> That slide deck does have a lot of very good information on it, but
> unfortunately I think it has led to a fundamental misunderstanding about
> Cassandra and write amplification. In particular, slide 51 vastly
> oversimplifies the situation.
>
> The wikipedia definition of write amplification looks at this from the
> perspective of the SSD controller:
> https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value
>
> In short, write amplification = data written to flash/data written by the
> host
>
> So, if I write 1MB in my application, but the SSD has to write my 1MB,
> plus rearrange another 1MB of data in order to make room for it, then I've
> written a total of 2MB and my write amplification is 2x.
>
> In other words, it is measuring how much extra the SSD controller has to
> write in order to do its own housekeeping.
>
> However, the wikipedia definition is a bit more constrained than how the
> term is used in the storage industry. The whole point of looking at write
> amplification is to understand the impact that a particular workload is
> going to have on the underlying NAND by virtue of the data written. So a
> definition of write amplification that is a little more relevant to the
> context of Cassandra is to consider this:
>
> write amplification = data written to flash/data written to the database
>
> So, while the fact that we only sequentially write large immutable
> SSTables does in fact mean that controller-level write amplification is
> near zero, Compaction comes along and completely destroys that tidy little
> story. Think about it, every time a compaction re-writes data that has
> already been written, we are creating a lot of application-level write
> amplification. Different compaction strategies and the workload itself
> impact what the real application-level write amp is, but generally
> speaking, LCS is the worst, followed by STCS and DTCS will cause the least
> write-amp. To measure this, you can usually use smartctl (may be another
> mechanism depending on SSD manufacturer) to get the physical bytes written
> to your SSDs and divide that by the data that you've actually logically
> written to Cassandra. I've measured (more than two years ago) LCS write amp
> as high as 50x on some workloads, which is significantly higher than the
> typical controller level write amp on a b-tree style update-in-place data
> store. Also note that the new storage engine in general reduces a lot of
> inefficiency in the Cassandra storage engine therefore reducing the impact
> of write amp due to compactions.
>
> However, if you're a person that understands SSDs, at this point you're
> wondering why we aren't burning out SSDs right and left. The reality is
> that general SSD endurance has gotten so good, that all this write amp
> isn't really a problem any more. If you're curious to read more about that,
> I recommend you start here:
>
>
> http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash
>
> and the paper that article mentions:
>
> http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf
>
>
> Hope this helps.
>
>
> Matt Kennedy
>
>
>
> On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com>
> wrote:
>
>> This is a good source on Cassandra + write amplification:
>> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
>>
>> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>:
>>
>>> Cassandra should not cause any write amplification. Write amplification
>>> appends only when you updates data on SSDs. Cassandra does not update any
>>> data in place. Data can be rewritten during compaction but it is never
>>> updated.
>>>
>>> Benjamin
>>>
>>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com>
>>> wrote:
>>>
>>> > Hi Dikang,
>>> >
>>> > I am not sure about what you call "amplification", but as sizes highly
>>> > depends on the structure I think I would probably give it a try using
>>> CCM (
>>> > https://github.com/pcmanus/ccm) or some test cluster with 'production
>>> > like'
>>> > setting and schema. You can write a row, flush it and see how big is
>>> the
>>> > data cluster-wide / per node.
>>> >
>>> > Hope this will be of some help.
>>> >
>>> > C*heers,
>>> > -----------------------
>>> > Alain Rodriguez - al...@thelastpickle.com
>>> > France
>>> >
>>> > The Last Pickle - Apache Cassandra Consulting
>>> > http://www.thelastpickle.com
>>> >
>>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>:
>>> >
>>> > > Hello there,
>>> > >
>>> > > I'm wondering is there a good way to measure the write amplification
>>> of
>>> > > Cassandra?
>>> > >
>>> > > I'm thinking it could be calculated by (size of mutations written to
>>> the
>>> > > node)/(number of bytes written to the disk).
>>> > >
>>> > > Do we already have the metrics of "size of mutations written to the
>>> > node"?
>>> > > I did not find it in jmx metrics.
>>> > >
>>> > > Thanks
>>> > >
>>> > > --
>>> > > Dikang
>>> > >
>>> > >
>>> >
>>>
>>
>>
>


-- 
Dikang

Reply via email to