Re: Deduplicating data on a node (RF=1)

2014-11-19 Thread Robert Coli
On Tue, Nov 18, 2014 at 10:04 AM, Alain Vandendorpe al...@tapstream.com wrote: Rob - thanks for that, I was wondering whether either of those would successfully deduplicate the data. We were hypothesizing that a decommission would merely stream the duplicates out as well as though they were

Re: Deduplicating data on a node (RF=1)

2014-11-18 Thread Robert Coli
On Mon, Nov 17, 2014 at 12:04 PM, Alain Vandendorpe al...@tapstream.com wrote: With bootstrapping and initial compactions finished that node now has what seems to be duplicate data, with almost exactly 2x the expected disk usage. CQL returns correct results but we depend on the ability to

Re: Deduplicating data on a node (RF=1)

2014-11-18 Thread Alain Vandendorpe
Thanks all - a little clarification: - The node has fully joined at this point with the duplicates - Cleanup has been run on older nodes - Currently using LCS Rob - thanks for that, I was wondering whether either of those would successfully deduplicate the data. We were hypothesizing that a

Deduplicating data on a node (RF=1)

2014-11-17 Thread Alain Vandendorpe
Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted and it seemingly restarted streaming from zero, without

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Michael Shuler
On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted and it

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Jonathan Haddad
If he deletes all the data with RF=1, won't he have data loss? On Mon Nov 17 2014 at 5:14:23 PM Michael Shuler mich...@pbandjelly.org wrote: On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Michael Shuler
On 11/17/2014 07:20 PM, Jonathan Haddad wrote: If he deletes all the data with RF=1, won't he have data loss? Of course, ignore my quick answer, Alain. -- Michael

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Eric Stevens
If the new node never formally joined the cluster (streaming never completed, it never entered UN state), shouldn't that node be safe to scrub and start over again? It shouldn't be taking primary writes while it's bootstrapping, should it? On Mon Nov 17 2014 at 6:34:04 PM Michael Shuler