Re: How often to run `nodetool repair`
On 08/01/2013 01:16 PM, Andrey Ilinykh wrote: TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. How is it possible? Every replica has TTL, so it when it expires every replica has tombstone. I don't see how you can get data with no tombstone. What do I miss? The only way I can think of is this scenario: - value "A" for some key is written with ttl=30days, to all replicas (i.e a long ttl or no ttl at all) - value "B" for the same key is written with ttl=1day, but doesn't reach all replicas - one day passes and the ttl=1day values turn into deletes - gc_grace passes and the tombstones are purged at this point, the replica that didn't get the ttl=1day value will think the older value "A" is live. I'm no expert on this so I may be mistaken, but in any case it's a corner case as overwriting columns with shorter ttls would be unusual. - Erik -
Re: How often to run `nodetool repair`
On Thu, Aug 1, 2013 at 1:16 PM, Andrey Ilinykh wrote: > > On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli wrote: > >> TTL is effectively DELETE; you need to run a repair once every >> gc_grace_seconds. If you don't, data might un-delete itself. >> > > How is it possible? Every replica has TTL, so it when it expires every > replica has tombstone. I don't see how you can get data with no tombstone. > What do I miss? > I knew I had heard of cases where repair is required despite TTL, but didn't recall the specifics. Thanks for the opportunity to go look it up... http://comments.gmane.org/gmane.comp.db.cassandra.user/21008 quoting Sylvain Lebresne : " The initial question was about "can I use inserting with ttl=1 instead of issuing deletes", ***so that would be a case where you do shadow a previous version with a very small ttl and so repair is important.*** (EMPHASIS rcoli) But you're right that if you only issue data with expiration (no deletes) and that you * either do not overwrite columns * or are sure that when you do overwrite, the value you're overwriting has a ttl that is lesser or equal than the ttl of the value you're overwriting with (+gc_grace to be precise) then yes, ***repair is not necessary because you can't have shadowed value resurfacing.*** (EMPHASIS rcoli) " So, to be more precise with my initial statement : "TTL is like DELETE in some cases, so unless you are certain that you are not (and will not be) in those cases, you should run repair when using TTL." Also you will be unable to repair entire keyspaces, you will have to repair on a per column family basis, manually excluding CFs matching these criteria, increasing management complexity. =Rob
Re: How often to run `nodetool repair`
> TTL is effectively DELETE; you need to run a repair once every > gc_grace_seconds. If you don't, data might un-delete itself. > The undelete part is not true. btw: With CASSANDRA-4917 TTLed columns will not even create a tombstone (assuming ttl > gc_grace). The rest of your mail I agree with :-)
Re: How often to run `nodetool repair`
Cassandra is an excellent choice for write heavy applications. Reading large sets of data is not as fast and not as easy, you may need to have your client paging thru it and you may need slice queries and proper PK+Indexes to think of in advance. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 3:03 PM To: user@cassandra.apache.org ; Arthur Zubarev Subject: Re: How often to run `nodetool repair` Arthur, Yes, my use case for this Cassandra cluster is analytics. I am building a google dapper (application tracing) like system. I collect application traces and write them to Cassandra. Then, I have periodic rollup tasks that read the data, do some summarization and write it back. Thoughts on how to manage a write heavy cluster? Thanks, Carl On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev wrote: Hi Carl, The ‘repair’ is for data reads. Compaction will take care of the expired data. The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 12:35 PM To: user@cassandra.apache.org Subject: How often to run `nodetool repair` Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
Re: How often to run `nodetool repair`
On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli wrote: > On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche wrote: > >> I read in the docs that `nodetool repair` should be regularly run unless >> no delete is ever performed. In my app, I never delete, but I heavily use >> the ttl feature. Should repair still be run regularly? Also, does repair >> take less time if it is run regularly? If not, is there a way to >> incrementally run it? It seems that when I do run repair, it takes a long >> time and causes high amounts CPU usage and iowait. >> > > TTL is effectively DELETE; you need to run a repair once every > gc_grace_seconds. If you don't, data might un-delete itself. > How is it possible? Every replica has TTL, so it when it expires every replica has tombstone. I don't see how you can get data with no tombstone. What do I miss? Andrey
Re: How often to run `nodetool repair`
On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche wrote: > I read in the docs that `nodetool repair` should be regularly run unless > no delete is ever performed. In my app, I never delete, but I heavily use > the ttl feature. Should repair still be run regularly? Also, does repair > take less time if it is run regularly? If not, is there a way to > incrementally run it? It seems that when I do run repair, it takes a long > time and causes high amounts CPU usage and iowait. > TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. Even if you don't care about data un-deleting itself, you still need to run repair occasionally to ensure overall consistency. Hinted handoff and read repair are only an optimization and do not have an official responsibility for providing consistency. If you struggle with the overhead of repair, one way to reduce the pain is to increase gc_grace_seconds. The default of 10 days is arbitrary and IMO too low, something more like 30 days will reduce the fixed very-high cost of repair, at the cost of keeping tombstones around for 3x as long. If you are running a version below 1.2.6, especially below 1.2.0, the combination of TTL with repair can lead to insane over-repair. https://issues.apache.org/jira/browse/CASSANDRA-4905 https://issues.apache.org/jira/browse/CASSANDRA-5398 There is a mechanism for incremental (manually managed..) repair. https://issues.apache.org/jira/browse/CASSANDRA-3912 =Rob
Re: How often to run `nodetool repair`
Arthur, Yes, my use case for this Cassandra cluster is analytics. I am building a google dapper (application tracing) like system. I collect application traces and write them to Cassandra. Then, I have periodic rollup tasks that read the data, do some summarization and write it back. Thoughts on how to manage a write heavy cluster? Thanks, Carl On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev wrote: > Hi Carl, > > The ‘repair’ is for data reads. Compaction will take care of the expired > data. > > The fact a repair runs long makes me think the nodes receive unbalanced > amounts of writes rather. > > Regards, > > Arthur > > *From:* Carl Lerche > *Sent:* Thursday, August 01, 2013 12:35 PM > *To:* user@cassandra.apache.org > *Subject:* How often to run `nodetool repair` > > Hello, > > I read in the docs that `nodetool repair` should be regularly run unless > no delete is ever performed. In my app, I never delete, but I heavily use > the ttl feature. Should repair still be run regularly? Also, does repair > take less time if it is run regularly? If not, is there a way to > incrementally run it? It seems that when I do run repair, it takes a long > time and causes high amounts CPU usage and iowait. > > Thoughts? > > Thanks, > Carl >
Re: How often to run `nodetool repair`
Hi Carl, The ‘repair’ is for data reads. Compaction will take care of the expired data. The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 12:35 PM To: user@cassandra.apache.org Subject: How often to run `nodetool repair` Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
Re: How often to run `nodetool repair`
We observed the same behavior. During last repair the data distribution on nodes was imbalanced as well resulting in one node bloating. On Aug 1, 2013 12:36 PM, "Carl Lerche" wrote: > Hello, > > I read in the docs that `nodetool repair` should be regularly run unless > no delete is ever performed. In my app, I never delete, but I heavily use > the ttl feature. Should repair still be run regularly? Also, does repair > take less time if it is run regularly? If not, is there a way to > incrementally run it? It seems that when I do run repair, it takes a long > time and causes high amounts CPU usage and iowait. > > Thoughts? > > Thanks, > Carl >