Re: timeouts on counter tables

2017-09-04 Thread Rudi Bruchez
I'm going to try different options. Do any of you have some experience 
with tweaking one of those conf parameters to improve read throughput, 
especially in case of counter tables ?



1/ using SSD :
trickle_fsync: true
trickle_fsync_interval_in_kb: 1024

2/ concurrent_compactors to the number of cores.

3/ concurrent_counter_writes

4/ Row Cache vs Chunk Cache

5/ change the compaction method to leveled, specifically when using 
counter columns ??


thanks !

On 3 September 2017 at 20:25, Rudi Bruchez > wrote:


Le 30/08/2017 à 05:33, Erick Ramirez a écrit :

Is it possible at all that you may have a data hotspot if it's
not hardware-related?



It does not seem so, The partition key seems well distributed and
the queries update different keys.

We have dropped counter_mutation messages in the log :

COUNTER_MUTATION messages were dropped in last 5000 ms: 0
internal and 2 cross node. Mean internal dropped latency: 0 ms
and Mean cross-node dropped latency: 5960 ms

Pool NameActive   Pending Completed  
Blocked  All Time Blocked

ReadStage32 5037481787 0 0
CounterMutationStage 32   221 5722101
0 0


The load could be too high ?

Thanks






<>

smime.p7s
Description: Signature cryptographique S/MIME


Re: timeouts on counter tables

2017-09-04 Thread Rudi Bruchez
It can happen on any of the nodes. We can have a large number of pending 
on ReadStage and CounterMutationStage. We'll try to increase 
concurrent_counter_writes to see how it changes things


Likely. I believe counter mutations are a tad more expensive than a 
normal mutation. If you're doing a lot of counter updates that 
probably doesn't help. Regardless, high amounts of pending 
reads/mutations is generally not good and indicates the node being 
overloaded. Are you just seeing this on the 1 node with IO issues or 
do other nodes have this problem as well?


On 3 September 2017 at 20:25, Rudi Bruchez > wrote:


Le 30/08/2017 à 05:33, Erick Ramirez a écrit :

Is it possible at all that you may have a data hotspot if it's
not hardware-related?



It does not seem so, The partition key seems well distributed and
the queries update different keys.

We have dropped counter_mutation messages in the log :

COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal
and 2 cross node. Mean internal dropped latency: 0 ms and Mean
cross-node dropped latency: 5960 ms

Pool NameActive   Pending Completed   Blocked 
All Time Blocked

ReadStage32   503 7481787 0 0
CounterMutationStage 32   221 5722101
0 0


The load could be too high ?

Thanks




<>

smime.p7s
Description: Signature cryptographique S/MIME


Re: timeouts on counter tables

2017-09-04 Thread kurt greaves
Likely. I believe counter mutations are a tad more expensive than a normal
mutation. If you're doing a lot of counter updates that probably doesn't
help. Regardless, high amounts of pending reads/mutations is generally not
good and indicates the node being overloaded. Are you just seeing this on
the 1 node with IO issues or do other nodes have this problem as well?

On 3 September 2017 at 20:25, Rudi Bruchez  wrote:

> Le 30/08/2017 à 05:33, Erick Ramirez a écrit :
>
> Is it possible at all that you may have a data hotspot if it's not
> hardware-related?
>
>
> It does not seem so, The partition key seems well distributed and the
> queries update different keys.
>
> We have dropped counter_mutation messages in the log :
>
> COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2
> cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped
> latency: 5960 ms
>
> Pool NameActive   Pending  Completed   Blocked
> All Time Blocked
> ReadStage32   5037481787
> 0 0
> CounterMutationStage 32   2215722101
> 0 0
>
> The load could be too high ?
>
> Thanks
>


Re: timeouts on counter tables

2017-09-03 Thread Rudi Bruchez

Le 30/08/2017 à 05:33, Erick Ramirez a écrit :
Is it possible at all that you may have a data hotspot if it's not 
hardware-related?



It does not seem so, The partition key seems well distributed and the 
queries update different keys.


We have dropped counter_mutation messages in the log :

COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2 
cross node. Mean internal dropped latency: 0 ms and Mean cross-node 
dropped latency: 5960 ms


Pool NameActive   Pending  Completed Blocked  
All Time Blocked
ReadStage32   5037481787 
0 0

CounterMutationStage 32   2215722101 0 0

The load could be too high ?

Thanks

<>

smime.p7s
Description: Signature cryptographique S/MIME


Re: timeouts on counter tables

2017-09-03 Thread Rudi Bruchez

Le 28/08/2017 à 03:30, kurt greaves a écrit :
If every node is a replica it sounds like you've got hardware issues. 
Have you compared iostat to the "normal" nodes? I assume there is 
nothing different in the logs on this one node?

Also sanity check, you are using DCAwareRoundRobinPolicy?
​


Thanks for the answer, I had to concentrate on other things for a few 
days, I'm back to that problem.


The PHP Driver call is :

$cassandrabuilder->withDatacenterAwareRoundRobinLoadBalancingPolicy("mycluster", 
0, false)->withTokenAwareRouting(true)->withSchemaMetadata(true);


After that, the call is done like this :

$result = $cassandra->execute(new Cassandra\SimpleStatement($query));
$cassandrasession->executeAsync($this->queryPrepared, array('arguments' 
=> $values));


Could the async call put too much pressure on the server ? Calls from 11 
client machines.


Thanks !

<>

smime.p7s
Description: Signature cryptographique S/MIME


Re: timeouts on counter tables

2017-08-29 Thread Erick Ramirez
Is it possible at all that you may have a data hotspot if it's not
hardware-related?

On Mon, Aug 28, 2017 at 11:30 AM, kurt greaves  wrote:

> If every node is a replica it sounds like you've got hardware issues. Have
> you compared iostat to the "normal" nodes? I assume there is nothing
> different in the logs on this one node?
> Also sanity check, you are using DCAwareRoundRobinPolicy?
> ​
>


Re: timeouts on counter tables

2017-08-27 Thread kurt greaves
If every node is a replica it sounds like you've got hardware issues. Have
you compared iostat to the "normal" nodes? I assume there is nothing
different in the logs on this one node?
Also sanity check, you are using DCAwareRoundRobinPolicy?
​


Re: timeouts on counter tables

2017-08-27 Thread Rudi Bruchez

Le 28/08/2017 à 00:11, kurt greaves a écrit :

What is your RF?

Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF 
and would give you worse performance


2 + 1 on a backup single node. Consistency one. You're right about RAID 
1, if the disk perf is the problem, that might be a way to improve on 
that part. Still it's strange that only one node suffers from IO problems.


<>

smime.p7s
Description: Signature cryptographique S/MIME


Re: timeouts on counter tables

2017-08-27 Thread kurt greaves
What is your RF?

Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF and
would give you worse performance