Re: Repair completes successfully but data is still inconsistent

2014-12-01 Thread Robert Coli
On Thu, Nov 27, 2014 at 2:38 AM, André Cruz andre.c...@co.sapo.pt wrote: On 26 Nov 2014, at 19:07, Robert Coli rc...@eventbrite.com wrote: Yes. Do you know if 5748 was created as a result of compaction or via a flush from a memtable? It was the result of a compaction: Ok, so in theory

Re: Repair completes successfully but data is still inconsistent

2014-11-27 Thread André Cruz
On 26 Nov 2014, at 19:07, Robert Coli rc...@eventbrite.com wrote: Yes. Do you know if 5748 was created as a result of compaction or via a flush from a memtable? It was the result of a compaction: INFO [CompactionExecutor:22422] 2014-11-13 13:08:41,926 CompactionTask.java (line 262)

Re: Repair completes successfully but data is still inconsistent

2014-11-26 Thread André Cruz
On 24 Nov 2014, at 18:54, Robert Coli rc...@eventbrite.com wrote: But for any given value on any given node, you can verify the value it has in 100% of SStables... that's what both the normal read path and repair should do when reconciling row fragments into the materialized row? Hard to

Re: Repair completes successfully but data is still inconsistent

2014-11-26 Thread Robert Coli
On Wed, Nov 26, 2014 at 10:17 AM, André Cruz andre.c...@co.sapo.pt wrote: Of these, the row in question was present on: Disco-NamespaceFile2-ic-5337-Data.db - tombstone column Disco-NamespaceFile2-ic-5719-Data.db - no trace of that column Disco-NamespaceFile2-ic-5748-Data.db - live column

Re: Repair completes successfully but data is still inconsistent

2014-11-24 Thread André Cruz
On 21 Nov 2014, at 19:01, Robert Coli rc...@eventbrite.com wrote: 2- Why won’t repair propagate this column value to the other nodes? Repairs have run everyday and the value is still missing on the other nodes. No idea. Are you sure it's not expired via TTL or masked in some other way?

Re: Repair completes successfully but data is still inconsistent

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 10:39 AM, André Cruz andre.c...@co.sapo.pt wrote: This data does not use TTLs. What other reason could there be for a mask? If I connect using cassandra-cli to that specific node, which becomes the coordinator, is it guaranteed to not ask another node when CL is ONE and

Re: Repair completes successfully but data is still inconsistent

2014-11-21 Thread André Cruz
On 19 Nov 2014, at 19:53, Robert Coli rc...@eventbrite.com wrote: My hunch is that you originally triggered this by picking up some obsolete SSTables during the 1.2 era. Probably if you clean up the existing zombies you will not encounter them again, unless you encounter another obsolete

Re: Repair completes successfully but data is still inconsistent

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 3:11 AM, André Cruz andre.c...@co.sapo.pt wrote: Can it be that they were all in the middle of a compaction (Leveled compaction) and the new sstables were written but the old ones were not deleted? Will Cassandra blindly pick up old and new sstables when it restarts?

Re: Repair completes successfully but data is still inconsistent

2014-11-19 Thread André Cruz
On 19 Nov 2014, at 00:43, Robert Coli rc...@eventbrite.com wrote: @OP : can you repro if you run a major compaction between the deletion and the tombstone collection? This happened in production and, AFAIK, for the first time in a system that has been running for 2 years. We have upgraded

Re: Repair completes successfully but data is still inconsistent

2014-11-19 Thread André Cruz
On 19 Nov 2014, at 11:37, André Cruz andre.c...@co.sapo.pt wrote: All the nodes were restarted on 21-23 October, for the upgrade (1.2.16 - 1.2.19) I mentioned. The delete happened after. I should also point out that we were experiencing problems related to CASSANDRA-4206 and CASSANDRA-7808.

Re: Repair completes successfully but data is still inconsistent

2014-11-19 Thread Robert Coli
On Wed, Nov 19, 2014 at 5:18 AM, André Cruz andre.c...@co.sapo.pt wrote: Each node has 4-9 of these exceptions as it is going down after being drained. It seems Cassandra was trying to delete an sstable. Can this be related? That seems plausible, though the versions of the files you indicate

Re: Repair completes successfully but data is still inconsistent

2014-11-18 Thread André Cruz
On 18 Nov 2014, at 01:08, Michael Shuler mich...@pbandjelly.org wrote: André, does `nodetool gossipinfo` show all the nodes in schema agreement? Yes: $ nodetool -h XXX.XXX.XXX.XXX gossipinfo |grep -i schema SCHEMA:8ef63726-c845-3565-9851-91c0074a9b5e

Re: Repair completes successfully but data is still inconsistent

2014-11-18 Thread Michael Shuler
`nodetool cleanup` also looks interesting as an option. -- Michael

Re: Repair completes successfully but data is still inconsistent

2014-11-18 Thread Robert Coli
On Tue, Nov 18, 2014 at 12:46 PM, Michael Shuler mich...@pbandjelly.org wrote: `nodetool cleanup` also looks interesting as an option. I don't understand why cleanup or scrub would help with a case where data is being un-tombstoned. 1 November - column is deleted - gc_grace_period is 10 days

Re: Repair completes successfully but data is still inconsistent

2014-11-17 Thread André Cruz
On 14 Nov 2014, at 18:44, André Cruz andre.c...@co.sapo.pt wrote: On 14 Nov 2014, at 18:29, Michael Shuler mich...@pbandjelly.org wrote: On 11/14/2014 12:12 PM, André Cruz wrote: Some extra info. I checked the backups and on the 8th of November, all 3 replicas had the tombstone of the

Re: Repair completes successfully but data is still inconsistent

2014-11-17 Thread Michael Shuler
On 11/17/2014 05:22 AM, André Cruz wrote: I have checked the logs of the 3 replicas for that period and nothing really jumps out. Still, repairs have been running daily, the log reports that the CF is synced, and as of this moment one of the replicas still returns the zombie column so they don’t

Re: Repair completes successfully but data is still inconsistent

2014-11-14 Thread André Cruz
Some extra info. I checked the backups and on the 8th of November, all 3 replicas had the tombstone of the deleted column. So: 1 November - column is deleted - gc_grace_period is 10 days 8 November - all 3 replicas have tombstone 13/14 November - column/tombstone is gone on 2 replicas, 3rd

Re: Repair completes successfully but data is still inconsistent

2014-11-14 Thread Michael Shuler
On 11/14/2014 12:12 PM, André Cruz wrote: Some extra info. I checked the backups and on the 8th of November, all 3 replicas had the tombstone of the deleted column. So: 1 November - column is deleted - gc_grace_period is 10 days 8 November - all 3 replicas have tombstone 13/14 November -

Re: Repair completes successfully but data is still inconsistent

2014-11-14 Thread André Cruz
On 14 Nov 2014, at 18:29, Michael Shuler mich...@pbandjelly.org wrote: On 11/14/2014 12:12 PM, André Cruz wrote: Some extra info. I checked the backups and on the 8th of November, all 3 replicas had the tombstone of the deleted column. So: 1 November - column is deleted - gc_grace_period