Re: timeouts on counter tables
I'm going to try different options. Do any of you have some experience with tweaking one of those conf parameters to improve read throughput, especially in case of counter tables ? 1/ using SSD : trickle_fsync: true trickle_fsync_interval_in_kb: 1024 2/ concurrent_compactors to the number of cores. 3/ concurrent_counter_writes 4/ Row Cache vs Chunk Cache 5/ change the compaction method to leveled, specifically when using counter columns ?? thanks ! On 3 September 2017 at 20:25, Rudi Bruchez> wrote: Le 30/08/2017 à 05:33, Erick Ramirez a écrit : Is it possible at all that you may have a data hotspot if it's not hardware-related? It does not seem so, The partition key seems well distributed and the queries update different keys. We have dropped counter_mutation messages in the log : COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 5960 ms Pool NameActive Pending Completed Blocked All Time Blocked ReadStage32 5037481787 0 0 CounterMutationStage 32 221 5722101 0 0 The load could be too high ? Thanks <> smime.p7s Description: Signature cryptographique S/MIME
Re: timeouts on counter tables
It can happen on any of the nodes. We can have a large number of pending on ReadStage and CounterMutationStage. We'll try to increase concurrent_counter_writes to see how it changes things Likely. I believe counter mutations are a tad more expensive than a normal mutation. If you're doing a lot of counter updates that probably doesn't help. Regardless, high amounts of pending reads/mutations is generally not good and indicates the node being overloaded. Are you just seeing this on the 1 node with IO issues or do other nodes have this problem as well? On 3 September 2017 at 20:25, Rudi Bruchez> wrote: Le 30/08/2017 à 05:33, Erick Ramirez a écrit : Is it possible at all that you may have a data hotspot if it's not hardware-related? It does not seem so, The partition key seems well distributed and the queries update different keys. We have dropped counter_mutation messages in the log : COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 5960 ms Pool NameActive Pending Completed Blocked All Time Blocked ReadStage32 503 7481787 0 0 CounterMutationStage 32 221 5722101 0 0 The load could be too high ? Thanks <> smime.p7s Description: Signature cryptographique S/MIME
Re: timeouts on counter tables
Likely. I believe counter mutations are a tad more expensive than a normal mutation. If you're doing a lot of counter updates that probably doesn't help. Regardless, high amounts of pending reads/mutations is generally not good and indicates the node being overloaded. Are you just seeing this on the 1 node with IO issues or do other nodes have this problem as well? On 3 September 2017 at 20:25, Rudi Bruchezwrote: > Le 30/08/2017 à 05:33, Erick Ramirez a écrit : > > Is it possible at all that you may have a data hotspot if it's not > hardware-related? > > > It does not seem so, The partition key seems well distributed and the > queries update different keys. > > We have dropped counter_mutation messages in the log : > > COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2 > cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped > latency: 5960 ms > > Pool NameActive Pending Completed Blocked > All Time Blocked > ReadStage32 5037481787 > 0 0 > CounterMutationStage 32 2215722101 > 0 0 > > The load could be too high ? > > Thanks >
Re: timeouts on counter tables
Le 30/08/2017 à 05:33, Erick Ramirez a écrit : Is it possible at all that you may have a data hotspot if it's not hardware-related? It does not seem so, The partition key seems well distributed and the queries update different keys. We have dropped counter_mutation messages in the log : COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 5960 ms Pool NameActive Pending Completed Blocked All Time Blocked ReadStage32 5037481787 0 0 CounterMutationStage 32 2215722101 0 0 The load could be too high ? Thanks <> smime.p7s Description: Signature cryptographique S/MIME
Re: timeouts on counter tables
Le 28/08/2017 à 03:30, kurt greaves a écrit : If every node is a replica it sounds like you've got hardware issues. Have you compared iostat to the "normal" nodes? I assume there is nothing different in the logs on this one node? Also sanity check, you are using DCAwareRoundRobinPolicy? Thanks for the answer, I had to concentrate on other things for a few days, I'm back to that problem. The PHP Driver call is : $cassandrabuilder->withDatacenterAwareRoundRobinLoadBalancingPolicy("mycluster", 0, false)->withTokenAwareRouting(true)->withSchemaMetadata(true); After that, the call is done like this : $result = $cassandra->execute(new Cassandra\SimpleStatement($query)); $cassandrasession->executeAsync($this->queryPrepared, array('arguments' => $values)); Could the async call put too much pressure on the server ? Calls from 11 client machines. Thanks ! <> smime.p7s Description: Signature cryptographique S/MIME
Re: timeouts on counter tables
Is it possible at all that you may have a data hotspot if it's not hardware-related? On Mon, Aug 28, 2017 at 11:30 AM, kurt greaveswrote: > If every node is a replica it sounds like you've got hardware issues. Have > you compared iostat to the "normal" nodes? I assume there is nothing > different in the logs on this one node? > Also sanity check, you are using DCAwareRoundRobinPolicy? > >
Re: timeouts on counter tables
If every node is a replica it sounds like you've got hardware issues. Have you compared iostat to the "normal" nodes? I assume there is nothing different in the logs on this one node? Also sanity check, you are using DCAwareRoundRobinPolicy?
Re: timeouts on counter tables
Le 28/08/2017 à 00:11, kurt greaves a écrit : What is your RF? Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF and would give you worse performance 2 + 1 on a backup single node. Consistency one. You're right about RAID 1, if the disk perf is the problem, that might be a way to improve on that part. Still it's strange that only one node suffers from IO problems. <> smime.p7s Description: Signature cryptographique S/MIME
Re: timeouts on counter tables
What is your RF? Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF and would give you worse performance