Re: all the nost are not reacheable when running massive deletes

Alain RODRIGUEZ Mon, 04 Apr 2016 09:35:52 -0700

Hola Paco,


> the mutation stages pending column grows without stop, could be that the
> problem



> CPU (near 96%)
>

Yes, basically I think you are over using this cluster.

but two of them have high cpu load, especially the 232 because I am running
> a lot of deletes using cqlsh in that node.
>

Solutions would be to run delete at a slower & constant path, against all
the nodes, using a balancing policy or adding capacity if all the nodes are
facing the issue and you can't slow deletes. You should also have a look at
iowait and steal, see if CPU are really used 100% or masking an other
issue. (disk not answering fast enough or hardware / shared instance
issue). I had some noisy neighbours at some point while using Cassandra on
AWS.

 I cannot find the reason that originates the timeouts.


I don't see it that weird while being overusing some/all the nodes.

I already have increased the timeouts, but It do not think that is a
> solution because the timeouts indicated another type of error


Any relevant logs in Cassandra nodes (other than dropped mutations INFO)?

7 nodes version 2.0.17


Note: Be aware that this Cassandra version is quite old and no longer
supported. Plus you might face issues that were solved already. I know that
upgrading is not straight forward, but 2.0 --> 2.1 brings an amazing set of
optimisations and some fixes too. You should try it out :-).

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-04-04 14:33 GMT+02:00 Paco Trujillo <f.truji...@genetwister.nl>:

> Hi everyone
>
>
>
> We are having problems with our cluster (7 nodes version 2.0.17) when
> running “massive deletes” on one of the nodes (via cql command line). At
> the beginning everything is fine, but after a while we start getting
> constant NoHostAvailableException using the datastax driver:
>
>
>
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (tried: /172.31.7.243:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.245:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.246:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.247:9042, /172.31.7.232:9042, /
> 172.31.7.233:9042, /172.31.7.244:9042 [only showing errors of first 3
> hosts, use getErrors() for more details])
>
>
>
>
>
> All the nodes are running:
>
>
>
> UN  172.31.7.244  152.21 GB  256     14.5%
> 58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
>
> UN  172.31.7.245  168.4 GB   256     14.5%
> bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
>
> UN  172.31.7.246  177.71 GB  256     13.7%
> 8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
>
> UN  172.31.7.247  158.57 GB  256     14.1%
> 94022081-a563-4042-81ab-75ffe4d13194  RAC1
>
> UN  172.31.7.243  176.83 GB  256     14.6%
> 0dda3410-db58-42f2-9351-068bdf68f530  RAC1
>
> UN  172.31.7.233  159 GB     256     13.6%
> 01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
>
> UN  172.31.7.232  166.05 GB  256     15.0%
> 4d009603-faa9-4add-b3a2-fe24ec16a7c1
>
>
>
> but two of them have high cpu load, especially the 232 because I am
> running a lot of deletes using cqlsh in that node.
>
>
>
> I know that deletes generate tombstones, but with 7 nodes in the cluster I
> do not think is normal that all the host are not accesible.
>
>
>
> We have a replication factor of 3 and for the deletes I am not using any
> consistency (so it is using the default ONE).
>
>
>
> I check the nodes which a lot of CPU (near 96%) and th gc activity remains
> on 1.6% (using only 3 GB from the 10 which have assigned). But looking at
> the thread pool stats, the mutation stages pending column grows without
> stop, could be that the problem?
>
>
>
> I cannot find the reason that originates the timeouts. I already have
> increased the timeouts, but It do not think that is a solution because the
> timeouts indicated another type of error. Anyone have a tip to try to
> determine where is the problem?
>
>
>
> Thanks in advance
>

Re: all the nost are not reacheable when running massive deletes

Reply via email to