Re: Node crashes on repair (Cassandra 3.11.1)

2017-12-07 Thread Christian Lorenz
I think we’ve hit the Bug described here:

https://issues.apache.org/jira/browse/CASSANDRA-14096

Regards,
Christian

Von: Christian Lorenz <christian.lor...@webtrekk.com>
Antworten an: "user@cassandra.apache.org" <user@cassandra.apache.org>
Datum: Freitag, 1. Dezember 2017 um 10:04
An: "user@cassandra.apache.org" <user@cassandra.apache.org>
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

Hi Jeff,

the repairs worked fine before on version 3.9. I noticed that the validation 
tasks when doing a repair are not bound anymore to the concurrent_compactors 
value.
Is this maybe too much pressure for the node to manage, so it gets stressed too 
much?

Greetings,
Christian

Von: Jeff Jirsa <jji...@gmail.com>
Antworten an: "user@cassandra.apache.org" <user@cassandra.apache.org>
Datum: Donnerstag, 30. November 2017 um 19:46
An: cassandra <user@cassandra.apache.org>
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

That was worded poorly. The depth has a max depth of 20, the tree is the same 
size for any range > 2**20.


On Thu, Nov 30, 2017 at 10:43 AM, Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
Merkle trees have a fixed size/depth (2**20), so it’s not that, but it could be 
timing out elsewhere (or still running validation or something)
--
Jeff Jirsa


On Nov 30, 2017, at 10:12 AM, Javier Canillas 
<javier.canil...@gmail.com<mailto:javier.canil...@gmail.com>> wrote:
Christian,

I'm not an expert, but maybe the merkle tree is too big to transfer between 
nodes and that's why it times out. How many nodes do you have and what's the 
size of the keyspace? Have you ever done a successfully repair before?

Cassandra reaper does repair based on tokenrange (or even part of it), that's 
why it can manage to require a small merkle tree.

Regards,

Javier.

2017-11-30 6:48 GMT-03:00 Christian Lorenz 
<christian.lor...@webtrekk.com<mailto:christian.lor...@webtrekk.com>>:
Hello,

after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a 
‘nodetool repair –full’ leads to the node crashing.
Logfile showed the following Exception:
ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:36,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.
at 
org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_151]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[na:1.8.0_151]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

The node datasize is ~270GB.  A repair with Cassandra reaper works fine though.

Any idea why this could be happening?

Regards,
Christian




Re: Node crashes on repair (Cassandra 3.11.1)

2017-12-01 Thread Christian Lorenz
Hi Jeff,

the repairs worked fine before on version 3.9. I noticed that the validation 
tasks when doing a repair are not bound anymore to the concurrent_compactors 
value.
Is this maybe too much pressure for the node to manage, so it gets stressed too 
much?

Greetings,
Christian

Von: Jeff Jirsa <jji...@gmail.com>
Antworten an: "user@cassandra.apache.org" <user@cassandra.apache.org>
Datum: Donnerstag, 30. November 2017 um 19:46
An: cassandra <user@cassandra.apache.org>
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

That was worded poorly. The depth has a max depth of 20, the tree is the same 
size for any range > 2**20.


On Thu, Nov 30, 2017 at 10:43 AM, Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
Merkle trees have a fixed size/depth (2**20), so it’s not that, but it could be 
timing out elsewhere (or still running validation or something)
--
Jeff Jirsa


On Nov 30, 2017, at 10:12 AM, Javier Canillas 
<javier.canil...@gmail.com<mailto:javier.canil...@gmail.com>> wrote:
Christian,

I'm not an expert, but maybe the merkle tree is too big to transfer between 
nodes and that's why it times out. How many nodes do you have and what's the 
size of the keyspace? Have you ever done a successfully repair before?

Cassandra reaper does repair based on tokenrange (or even part of it), that's 
why it can manage to require a small merkle tree.

Regards,

Javier.

2017-11-30 6:48 GMT-03:00 Christian Lorenz 
<christian.lor...@webtrekk.com<mailto:christian.lor...@webtrekk.com>>:
Hello,

after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a 
‘nodetool repair –full’ leads to the node crashing.
Logfile showed the following Exception:
ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:36,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.
at 
org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_151]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[na:1.8.0_151]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

The node datasize is ~270GB.  A repair with Cassandra reaper works fine though.

Any idea why this could be happening?

Regards,
Christian




Re: Node crashes on repair (Cassandra 3.11.1)

2017-11-30 Thread Jeff Jirsa
That was worded poorly. The depth has a max depth of 20, the tree is the
same size for any range > 2**20.


On Thu, Nov 30, 2017 at 10:43 AM, Jeff Jirsa  wrote:

> Merkle trees have a fixed size/depth (2**20), so it’s not that, but it
> could be timing out elsewhere (or still running validation or something)
>
> --
> Jeff Jirsa
>
>
> On Nov 30, 2017, at 10:12 AM, Javier Canillas 
> wrote:
>
> Christian,
>
> I'm not an expert, but maybe the merkle tree is too big to transfer
> between nodes and that's why it times out. How many nodes do you have and
> what's the size of the keyspace? Have you ever done a successfully repair
> before?
>
> Cassandra reaper does repair based on tokenrange (or even part of it),
> that's why it can manage to require a small merkle tree.
>
> Regards,
>
> Javier.
>
> 2017-11-30 6:48 GMT-03:00 Christian Lorenz 
> :
>
>> Hello,
>>
>>
>>
>> after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a
>> ‘nodetool repair –full’ leads to the node crashing.
>>
>> Logfile showed the following Exception:
>>
>> ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:36,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 0 responses.
>>
>> at 
>> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:175)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at 
>> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at 
>> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[na:1.8.0_151]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ~[na:1.8.0_151]
>>
>> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>
>> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
>>
>>
>>
>> The node datasize is ~270GB.  A repair with Cassandra reaper works fine
>> though.
>>
>>
>>
>> Any idea why this could be happening?
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>
>


Re: Node crashes on repair (Cassandra 3.11.1)

2017-11-30 Thread Jeff Jirsa
Merkle trees have a fixed size/depth (2**20), so it’s not that, but it could be 
timing out elsewhere (or still running validation or something)

-- 
Jeff Jirsa


> On Nov 30, 2017, at 10:12 AM, Javier Canillas  
> wrote:
> 
> Christian,
> 
> I'm not an expert, but maybe the merkle tree is too big to transfer between 
> nodes and that's why it times out. How many nodes do you have and what's the 
> size of the keyspace? Have you ever done a successfully repair before?
> 
> Cassandra reaper does repair based on tokenrange (or even part of it), that's 
> why it can manage to require a small merkle tree.
> 
> Regards,
> 
> Javier.
> 
> 2017-11-30 6:48 GMT-03:00 Christian Lorenz :
>> Hello,
>> 
>>  
>> 
>> after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a 
>> ‘nodetool repair –full’ leads to the node crashing.
>> 
>> Logfile showed the following Exception:
>> 
>> ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 
>> - Exception in thread Thread[ReadRepairStage:36,5,main]
>> 
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
>> received only 0 responses.
>> 
>> at 
>> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199)
>>  ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at 
>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)
>>  ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at 
>> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) 
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at 
>> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
>>  ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at 
>> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
>>  ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>  ~[na:1.8.0_151]
>> 
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>  ~[na:1.8.0_151]
>> 
>> at 
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>>  ~[apache-cassandra-3.11.1.jar:3.11.1]
>> 
>> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
>> 
>>  
>> 
>> The node datasize is ~270GB.  A repair with Cassandra reaper works fine 
>> though.
>> 
>>  
>> 
>> Any idea why this could be happening?
>> 
>>  
>> 
>> Regards,
>> 
>> Christian
>> 
> 


Re: Node crashes on repair (Cassandra 3.11.1)

2017-11-30 Thread Javier Canillas
Christian,

I'm not an expert, but maybe the merkle tree is too big to transfer between
nodes and that's why it times out. How many nodes do you have and what's
the size of the keyspace? Have you ever done a successfully repair before?

Cassandra reaper does repair based on tokenrange (or even part of it),
that's why it can manage to require a small merkle tree.

Regards,

Javier.

2017-11-30 6:48 GMT-03:00 Christian Lorenz :

> Hello,
>
>
>
> after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a
> ‘nodetool repair –full’ leads to the node crashing.
>
> Logfile showed the following Exception:
>
> ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439
> CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:36,5,
> main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
> - received only 0 responses.
>
> at org.apache.cassandra.service.DataResolver$
> RepairMergeListener.close(DataResolver.java:199)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at org.apache.cassandra.db.partitions.
> UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at 
> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at 
> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at 
> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[na:1.8.0_151]
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[na:1.8.0_151]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
>
>
>
> The node datasize is ~270GB.  A repair with Cassandra reaper works fine
> though.
>
>
>
> Any idea why this could be happening?
>
>
>
> Regards,
>
> Christian
>


Node crashes on repair (Cassandra 3.11.1)

2017-11-30 Thread Christian Lorenz
Hello,

after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a 
‘nodetool repair –full’ leads to the node crashing.
Logfile showed the following Exception:
ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:36,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.
at 
org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_151]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[na:1.8.0_151]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

The node datasize is ~270GB.  A repair with Cassandra reaper works fine though.

Any idea why this could be happening?

Regards,
Christian