Yes, we failed to run nodetool repair for quite a while and I believe it
might have been our situation that prompted the addition of that info to the
wiki :-)

We've tried/are trying two of the suggested steps there, but haven't done
the process of removing/reinserting the pseudo-failed nodes (all of them?).
 I'm thinking that won't make a difference as these deletes are being all
the time as the columns pop up and we automatically detect errors and remove
the columns (for the n'th time :-) so the timestamps should be much later
than any old & existing columns that aren't supposed to be there on the
nodes.  At least that's what I've convinced myself of!

As far as running flush before repair, I got that idea from the comments in
1748.  I'm not convinced at all it's necessary but thought it might help if
there was (still?) a problem in 0.6.10 code.

--Scott

On Mon, Feb 7, 2011 at 2:31 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Was there a time where nodetool repair was not run frequently ?
>
> There are some steps listed here to reset issues
> around tombstones coming back to life
>
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
>
> <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>Why
> do you run nodetool flush before the repair ?
>
> Hope that helps
> Aaron
>
>
> On 08 Feb, 2011,at 10:19 AM, Scott McCarty <sc...@metajinx.com> wrote:
>
> Hi,
>
> Does anyone know if anything similar to
> https://issues.apache.org/jira/browse/CASSANDRA-1748 or
> https://issues.apache.org/jira/browse/CASSANDRA-1837 exists in 0.6.x
> releases?  Both of those bugs look like they were introduced, found, and
> fixed in 0.7, and CASSANDRA-1837 comments indicate that 0.6 tests passed but
> I wanted to double check on this because we've been seeing deleted columns
> reappear in our cluster.
>
> Or, maybe if someone has an idea of what might be behind the following
> oddness:  columns are deleted, then about two days later (which is what we
> have GCGraceSeconds set to) they pop back up again, shortly after running a
> "nodetool repair" on all the nodes in the cluster.
>
> One tell-tale clue is that the timestamps on the columns that the client
> logs show were deleted look to be the original timestamps, as they are far
> back in the past and our code doesn't create timestamp values in the past.
>  A bunch of investigation work has us almost convinced that the deleted
> columns are popping up in large numbers after a repair is done.
>
> I've posted a message to this list before regarding deleted columns coming
> back and one suggestion received was to assume it's a client-side bug, so
> we've done a ton of things to try to rule out various possibilities and
> we're left with what seems like an improbability of a basic problem on the
> Cassandra server.
>
> Just to close, we're running nodetool repair nightly (right after we do a
> nodetool flush), we have GCGraceSeconds set to 2 days, and read/writes for
> our tests are CL.ALL.  Should we be running nodetool compact also?
>
> Thanks,
>   Scott
>
>

Reply via email to