Yes, we failed to run nodetool repair for quite a while and I believe it might have been our situation that prompted the addition of that info to the wiki :-)
We've tried/are trying two of the suggested steps there, but haven't done the process of removing/reinserting the pseudo-failed nodes (all of them?). I'm thinking that won't make a difference as these deletes are being all the time as the columns pop up and we automatically detect errors and remove the columns (for the n'th time :-) so the timestamps should be much later than any old & existing columns that aren't supposed to be there on the nodes. At least that's what I've convinced myself of! As far as running flush before repair, I got that idea from the comments in 1748. I'm not convinced at all it's necessary but thought it might help if there was (still?) a problem in 0.6.10 code. --Scott On Mon, Feb 7, 2011 at 2:31 PM, Aaron Morton <aa...@thelastpickle.com>wrote: > Was there a time where nodetool repair was not run frequently ? > > There are some steps listed here to reset issues > around tombstones coming back to life > > http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds > > > <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>Why > do you run nodetool flush before the repair ? > > Hope that helps > Aaron > > > On 08 Feb, 2011,at 10:19 AM, Scott McCarty <sc...@metajinx.com> wrote: > > Hi, > > Does anyone know if anything similar to > https://issues.apache.org/jira/browse/CASSANDRA-1748 or > https://issues.apache.org/jira/browse/CASSANDRA-1837 exists in 0.6.x > releases? Both of those bugs look like they were introduced, found, and > fixed in 0.7, and CASSANDRA-1837 comments indicate that 0.6 tests passed but > I wanted to double check on this because we've been seeing deleted columns > reappear in our cluster. > > Or, maybe if someone has an idea of what might be behind the following > oddness: columns are deleted, then about two days later (which is what we > have GCGraceSeconds set to) they pop back up again, shortly after running a > "nodetool repair" on all the nodes in the cluster. > > One tell-tale clue is that the timestamps on the columns that the client > logs show were deleted look to be the original timestamps, as they are far > back in the past and our code doesn't create timestamp values in the past. > A bunch of investigation work has us almost convinced that the deleted > columns are popping up in large numbers after a repair is done. > > I've posted a message to this list before regarding deleted columns coming > back and one suggestion received was to assume it's a client-side bug, so > we've done a ton of things to try to rule out various possibilities and > we're left with what seems like an improbability of a basic problem on the > Cassandra server. > > Just to close, we're running nodetool repair nightly (right after we do a > nodetool flush), we have GCGraceSeconds set to 2 days, and read/writes for > our tests are CL.ALL. Should we be running nodetool compact also? > > Thanks, > Scott > >