Re: Why data tripled in size after repair?

2012-10-02 Thread Sylvain Lebresne
It's in the 1.1 branch; I don't remember if it went into a release yet. If not, it'll be in the next 1.1.x release. As the ticket says, this is in since 1.1.1. I don't pretend this is well documented, but it's in. -- Sylvain

Re: Why data tripled in size after repair?

2012-10-02 Thread Andrey Ilinykh
On Tue, Oct 2, 2012 at 12:05 AM, Sylvain Lebresne sylv...@datastax.com wrote: It's in the 1.1 branch; I don't remember if it went into a release yet. If not, it'll be in the next 1.1.x release. As the ticket says, this is in since 1.1.1. I don't pretend this is well documented, but it's in.

Re: Why data tripled in size after repair?

2012-10-01 Thread Peter Schuller
It looks like what I need. Couple questions. Does it work with RandomPartinioner only? I use ByteOrderedPartitioner. I believe it should work with BOP based on cursory re-examination of the patch. I could be wrong. I don't see it as part of any release. Am I supposed to build my own version

Re: Why data tripled in size after repair?

2012-09-27 Thread Sylvain Lebresne
I don't understand why it copied data twice. In worst case scenario it should copy everything (~90G) Sadly no, repair is currently peer-to-peer based (there is a ticket to fix it: https://issues.apache.org/jira/browse/CASSANDRA-3200, but that's not trivial). This mean that you can end up with

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Thu, Sep 27, 2012 at 9:52 AM, Sylvain Lebresne sylv...@datastax.com wrote: I don't understand why it copied data twice. In worst case scenario it should copy everything (~90G) Sadly no, repair is currently peer-to-peer based (there is a ticket to fix it:

Re: Why data tripled in size after repair?

2012-09-27 Thread Sylvain Lebresne
I see. It explains why I get 85G + 85G instead of 90G. But after next repair I have six extra files 75G each, how is it possible? Maybe you've run repair on other nodes? Basically repair is a fairly blind process. If it consider that a given range (and by range I mean here the ones that repair

Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
Hello everybody! I have 3 node cluster with replication factor of 3. each node has 800G disk and it used to have 100G of data. What is strange every time I run repair data takes almost 3 times more - 270G, then I run compaction and get 100G back. Unfortunately, yesterday I forget to compact and

Re: Why data tripled in size after repair?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh ailin...@gmail.com wrote: [ repair ballooned my data size ] 1. Why repair almost triples data size? You didn't mention what version of cassandra you're running. In some old versions of cassandra (prior to 1.0), repair often creates even more

Re: Why data tripled in size after repair?

2012-09-26 Thread Peter Schuller
What is strange every time I run repair data takes almost 3 times more - 270G, then I run compaction and get 100G back. https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the maion issues with repair. In short - in your case the limited granularity of merkle trees is causing too much

Re: Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 11:07 AM, Rob Coli rc...@palominodb.com wrote: On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh ailin...@gmail.com wrote: [ repair ballooned my data size ] 1. Why repair almost triples data size? You didn't mention what version of cassandra you're running. In some old