Re: How to measure the write amplification of C*?

2016-03-23 Thread Dikang Gu
As a follow-up, I'm going to write a simple patch to expose the number of flushed bytes from memtable to JMX, so that we can easily monitor it. Here is the jira: https://issues.apache.org/jira/browse/CASSANDRA-11420 On Thu, Mar 10, 2016 at 12:55 PM, Jack Krupansky

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jack Krupansky
The doc does say this: "A log-structured engine that avoids overwrites and uses sequential IO to update data is essential for writing to solid-state disks (SSD) and hard disks (HDD) On HDD, writing randomly involves a higher number of seek operations than sequential writing. The seek penalty

Re: How to measure the write amplification of C*?

2016-03-10 Thread Sebastian Estevez
https://issues.apache.org/jira/browse/CASSANDRA-10805 All the best, [image: datastax_logo.png] Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] [image:

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jeff Ferland
Compaction logs show the number of bytes written and the level written to. Base write load = table flushed to L0. Write amplification = sum of all compactions written to disk for the table. On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu wrote: > Hi Matt, > > Thanks for the

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jeff Jirsa
A bit of Splunk-fu probably works for this – you’ll have different line entries for memtable flushes vs compaction output. Comparing the two will give you a general idea of compaction amplification. From: Dikang Gu Reply-To: "user@cassandra.apache.org" Date: Thursday, March 10, 2016 at

Re: How to measure the write amplification of C*?

2016-03-10 Thread Matt Kennedy
It isn't really the data written by the host that you're concerned with, it's the data written by your application. I'd start by instrumenting your application tier to tally up the size of the values that it writes to C*. However, it may not be extremely useful to have this value. You can't do

Re: How to measure the write amplification of C*?

2016-03-10 Thread Dikang Gu
Hi Matt, Thanks for the detailed explanation! Yes, this is exactly what I'm looking for, "write amplification = data written to flash/data written by the host". We are heavily using the LCS in production, so I'd like to figure out the amplification caused by that and see what we can do to

Re: How to measure the write amplification of C*?

2016-03-10 Thread Matt Kennedy
After posting this, Jon Haddad pinged me on chat and said (I'm paraphrasing): Actually, this company I work with a lot burns through SSDs so fast it's absurd, their write amp is gigantic. This is a very good point, however it isn't what I would call typical, and a lot is going to depend on the

Re: How to measure the write amplification of C*?

2016-03-10 Thread Matt Kennedy
TL;DR - Cassandra actually causes a ton of write amplification but it doesn't freaking matter any more. Read on for details... That slide deck does have a lot of very good information on it, but unfortunately I think it has led to a fundamental misunderstanding about Cassandra and write

Re: How to measure the write amplification of C*?

2016-03-10 Thread Paulo Motta
This is a good source on Cassandra + write amplification: http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives 2016-03-10 9:57 GMT-03:00 Benjamin Lerer : > Cassandra should not cause any write amplification. Write amplification > appends only when you

Re: How to measure the write amplification of C*?

2016-03-10 Thread Alain RODRIGUEZ
Hi Dikang, I am not sure about what you call "amplification", but as sizes highly depends on the structure I think I would probably give it a try using CCM ( https://github.com/pcmanus/ccm) or some test cluster with 'production like' setting and schema. You can write a row, flush it and see how

How to measure the write amplification of C*?

2016-03-09 Thread Dikang Gu
Hello there, I'm wondering is there a good way to measure the write amplification of Cassandra? I'm thinking it could be calculated by (size of mutations written to the node)/(number of bytes written to the disk). Do we already have the metrics of "size of mutations written to the node"? I did