Re: Incremental repairs leading to unrepaired data

2016-11-01 Thread kurt Greaves
Can't say I have too many ideas. If load is low during the repair it
shouldn't be happening. Your disks aren't overutilised correct? No other
processes writing loads of data to them?


Re: Incremental repairs leading to unrepaired data

2016-11-01 Thread Stefano Ortolani
That is not happening anymore since I am repairing a keyspace with
much less data (the other one is still there in write-only mode).
The command I am using is the most boring (even shed the -pr option so
to keep anticompactions to a minimum): nodetool -h localhost repair

It's executed sequentially on each node (no overlapping, next node
waits for the previous to complete).

Regards,
Stefano Ortolani

On Mon, Oct 31, 2016 at 11:18 PM, kurt Greaves  wrote:
> Blowing out to 1k SSTables seems a bit full on. What args are you passing to
> repair?
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
> On 31 October 2016 at 09:49, Stefano Ortolani  wrote:
>>
>> I've collected some more data-points, and I still see dropped
>> mutations with compaction_throughput_mb_per_sec set to 8.
>> The only notable thing regarding the current setup is that I have
>> another keyspace (not being repaired though) with really wide rows
>> (100MB per partition), but that shouldn't have any impact in theory.
>> Nodes do not seem that overloaded either and don't see any GC spikes
>> while those mutations are dropped :/
>>
>> Hitting a dead end here, any further idea where to look for further ideas?
>>
>> Regards,
>> Stefano
>>
>> On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani 
>> wrote:
>> > That's what I was thinking. Maybe GC pressure?
>> > Some more details: during anticompaction I have some CFs exploding to 1K
>> > SStables (to be back to ~200 upon completion).
>> > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
>> > relying on spinning disks, with ~150GB per node.
>> > Current version is 3.0.8.
>> >
>> >
>> > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta 
>> > wrote:
>> >>
>> >> That's pretty low already, but perhaps you should lower to see if it
>> >> will
>> >> improve the dropped mutations during anti-compaction (even if it
>> >> increases
>> >> repair time), otherwise the problem might be somewhere else. Generally
>> >> dropped mutations is a signal of cluster overload, so if there's
>> >> nothing
>> >> else wrong perhaps you need to increase your capacity. What version are
>> >> you
>> >> in?
>> >>
>> >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani :
>> >>>
>> >>> Not yet. Right now I have it set at 16.
>> >>> Would halving it more or less double the repair time?
>> >>>
>> >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
>> >>> wrote:
>> 
>>  Anticompaction throttling can be done by setting the usual
>>  compaction_throughput_mb_per_sec knob on cassandra.yaml or via
>>  nodetool
>>  setcompactionthroughput. Did you try lowering that  and checking if
>>  that
>>  improves the dropped mutations?
>> 
>>  2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
>> >
>> > Hi all,
>> >
>> > I am running incremental repaird on a weekly basis (can't do it
>> > every
>> > day as one single run takes 36 hours), and every time, I have at
>> > least one
>> > node dropping mutations as part of the process (this almost always
>> > during
>> > the anticompaction phase). Ironically this leads to a system where
>> > repairing
>> > makes data consistent at the cost of making some other data not
>> > consistent.
>> >
>> > Does anybody know why this is happening?
>> >
>> > My feeling is that this might be caused by anticompacting column
>> > families with really wide rows and with many SStables. If that is
>> > the case,
>> > any way I can throttle that?
>> >
>> > Thanks!
>> > Stefano
>> 
>> 
>> >>>
>> >>
>> >
>
>


Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread kurt Greaves
Blowing out to 1k SSTables seems a bit full on. What args are you passing
to repair?

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 31 October 2016 at 09:49, Stefano Ortolani  wrote:

> I've collected some more data-points, and I still see dropped
> mutations with compaction_throughput_mb_per_sec set to 8.
> The only notable thing regarding the current setup is that I have
> another keyspace (not being repaired though) with really wide rows
> (100MB per partition), but that shouldn't have any impact in theory.
> Nodes do not seem that overloaded either and don't see any GC spikes
> while those mutations are dropped :/
>
> Hitting a dead end here, any further idea where to look for further ideas?
>
> Regards,
> Stefano
>
> On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani 
> wrote:
> > That's what I was thinking. Maybe GC pressure?
> > Some more details: during anticompaction I have some CFs exploding to 1K
> > SStables (to be back to ~200 upon completion).
> > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
> > relying on spinning disks, with ~150GB per node.
> > Current version is 3.0.8.
> >
> >
> > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta 
> > wrote:
> >>
> >> That's pretty low already, but perhaps you should lower to see if it
> will
> >> improve the dropped mutations during anti-compaction (even if it
> increases
> >> repair time), otherwise the problem might be somewhere else. Generally
> >> dropped mutations is a signal of cluster overload, so if there's nothing
> >> else wrong perhaps you need to increase your capacity. What version are
> you
> >> in?
> >>
> >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani :
> >>>
> >>> Not yet. Right now I have it set at 16.
> >>> Would halving it more or less double the repair time?
> >>>
> >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
> >>> wrote:
> 
>  Anticompaction throttling can be done by setting the usual
>  compaction_throughput_mb_per_sec knob on cassandra.yaml or via
> nodetool
>  setcompactionthroughput. Did you try lowering that  and checking if
> that
>  improves the dropped mutations?
> 
>  2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
> >
> > Hi all,
> >
> > I am running incremental repaird on a weekly basis (can't do it every
> > day as one single run takes 36 hours), and every time, I have at
> least one
> > node dropping mutations as part of the process (this almost always
> during
> > the anticompaction phase). Ironically this leads to a system where
> repairing
> > makes data consistent at the cost of making some other data not
> consistent.
> >
> > Does anybody know why this is happening?
> >
> > My feeling is that this might be caused by anticompacting column
> > families with really wide rows and with many SStables. If that is
> the case,
> > any way I can throttle that?
> >
> > Thanks!
> > Stefano
> 
> 
> >>>
> >>
> >
>


Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread Stefano Ortolani
I've collected some more data-points, and I still see dropped
mutations with compaction_throughput_mb_per_sec set to 8.
The only notable thing regarding the current setup is that I have
another keyspace (not being repaired though) with really wide rows
(100MB per partition), but that shouldn't have any impact in theory.
Nodes do not seem that overloaded either and don't see any GC spikes
while those mutations are dropped :/

Hitting a dead end here, any further idea where to look for further ideas?

Regards,
Stefano

On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani  wrote:
> That's what I was thinking. Maybe GC pressure?
> Some more details: during anticompaction I have some CFs exploding to 1K
> SStables (to be back to ~200 upon completion).
> HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
> relying on spinning disks, with ~150GB per node.
> Current version is 3.0.8.
>
>
> On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta 
> wrote:
>>
>> That's pretty low already, but perhaps you should lower to see if it will
>> improve the dropped mutations during anti-compaction (even if it increases
>> repair time), otherwise the problem might be somewhere else. Generally
>> dropped mutations is a signal of cluster overload, so if there's nothing
>> else wrong perhaps you need to increase your capacity. What version are you
>> in?
>>
>> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani :
>>>
>>> Not yet. Right now I have it set at 16.
>>> Would halving it more or less double the repair time?
>>>
>>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
>>> wrote:

 Anticompaction throttling can be done by setting the usual
 compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool
 setcompactionthroughput. Did you try lowering that  and checking if that
 improves the dropped mutations?

 2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
>
> Hi all,
>
> I am running incremental repaird on a weekly basis (can't do it every
> day as one single run takes 36 hours), and every time, I have at least one
> node dropping mutations as part of the process (this almost always during
> the anticompaction phase). Ironically this leads to a system where 
> repairing
> makes data consistent at the cost of making some other data not 
> consistent.
>
> Does anybody know why this is happening?
>
> My feeling is that this might be caused by anticompacting column
> families with really wide rows and with many SStables. If that is the 
> case,
> any way I can throttle that?
>
> Thanks!
> Stefano


>>>
>>
>


Re: Incremental repairs leading to unrepaired data

2016-08-10 Thread Stefano Ortolani
That's what I was thinking. Maybe GC pressure?
Some more details: during anticompaction I have some CFs exploding to 1K
SStables (to be back to ~200 upon completion).
HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
relying on spinning disks, with ~150GB per node.
Current version is 3.0.8.


On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta 
wrote:

> That's pretty low already, but perhaps you should lower to see if it will
> improve the dropped mutations during anti-compaction (even if it increases
> repair time), otherwise the problem might be somewhere else. Generally
> dropped mutations is a signal of cluster overload, so if there's nothing
> else wrong perhaps you need to increase your capacity. What version are you
> in?
>
> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani :
>
>> Not yet. Right now I have it set at 16.
>> Would halving it more or less double the repair time?
>>
>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
>> wrote:
>>
>>> Anticompaction throttling can be done by setting the usual
>>> compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool
>>> setcompactionthroughput. Did you try lowering that  and checking if that
>>> improves the dropped mutations?
>>>
>>> 2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
>>>
 Hi all,

 I am running incremental repaird on a weekly basis (can't do it every
 day as one single run takes 36 hours), and every time, I have at least one
 node dropping mutations as part of the process (this almost always during
 the anticompaction phase). Ironically this leads to a system where
 repairing makes data consistent at the cost of making some other data not
 consistent.

 Does anybody know why this is happening?

 My feeling is that this might be caused by anticompacting column
 families with really wide rows and with many SStables. If that is the case,
 any way I can throttle that?

 Thanks!
 Stefano

>>>
>>>
>>
>


Re: Incremental repairs leading to unrepaired data

2016-08-10 Thread Paulo Motta
That's pretty low already, but perhaps you should lower to see if it will
improve the dropped mutations during anti-compaction (even if it increases
repair time), otherwise the problem might be somewhere else. Generally
dropped mutations is a signal of cluster overload, so if there's nothing
else wrong perhaps you need to increase your capacity. What version are you
in?

2016-08-10 8:21 GMT-03:00 Stefano Ortolani :

> Not yet. Right now I have it set at 16.
> Would halving it more or less double the repair time?
>
> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
> wrote:
>
>> Anticompaction throttling can be done by setting the usual
>> compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool
>> setcompactionthroughput. Did you try lowering that  and checking if that
>> improves the dropped mutations?
>>
>> 2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
>>
>>> Hi all,
>>>
>>> I am running incremental repaird on a weekly basis (can't do it every
>>> day as one single run takes 36 hours), and every time, I have at least one
>>> node dropping mutations as part of the process (this almost always during
>>> the anticompaction phase). Ironically this leads to a system where
>>> repairing makes data consistent at the cost of making some other data not
>>> consistent.
>>>
>>> Does anybody know why this is happening?
>>>
>>> My feeling is that this might be caused by anticompacting column
>>> families with really wide rows and with many SStables. If that is the case,
>>> any way I can throttle that?
>>>
>>> Thanks!
>>> Stefano
>>>
>>
>>
>


Re: Incremental repairs leading to unrepaired data

2016-08-10 Thread Stefano Ortolani
Not yet. Right now I have it set at 16.
Would halving it more or less double the repair time?

On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
wrote:

> Anticompaction throttling can be done by setting the usual
> compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool
> setcompactionthroughput. Did you try lowering that  and checking if that
> improves the dropped mutations?
>
> 2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
>
>> Hi all,
>>
>> I am running incremental repaird on a weekly basis (can't do it every day
>> as one single run takes 36 hours), and every time, I have at least one node
>> dropping mutations as part of the process (this almost always during the
>> anticompaction phase). Ironically this leads to a system where repairing
>> makes data consistent at the cost of making some other data not consistent.
>>
>> Does anybody know why this is happening?
>>
>> My feeling is that this might be caused by anticompacting column families
>> with really wide rows and with many SStables. If that is the case, any way
>> I can throttle that?
>>
>> Thanks!
>> Stefano
>>
>
>


Re: Incremental repairs leading to unrepaired data

2016-08-09 Thread Paulo Motta
Anticompaction throttling can be done by setting the usual
compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool
setcompactionthroughput. Did you try lowering that  and checking if that
improves the dropped mutations?

2016-08-09 13:32 GMT-03:00 Stefano Ortolani :

> Hi all,
>
> I am running incremental repaird on a weekly basis (can't do it every day
> as one single run takes 36 hours), and every time, I have at least one node
> dropping mutations as part of the process (this almost always during the
> anticompaction phase). Ironically this leads to a system where repairing
> makes data consistent at the cost of making some other data not consistent.
>
> Does anybody know why this is happening?
>
> My feeling is that this might be caused by anticompacting column families
> with really wide rows and with many SStables. If that is the case, any way
> I can throttle that?
>
> Thanks!
> Stefano
>