Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
> Is there an order in which the events you described happened, or is the
order with which you presented them the order you notice things going
wrong?

At first, threads count (Thrift) start increasing.
After 2 or 3 minutes they consume all CPU cores.
After that, simultaneously: message drops occur, read latency increases,
active read tasks are noticed.

пт, 28 июн. 2019 г. в 01:40, Avinash Mandava :

> Yeah i skimmed too fast, don't add more work if CPU is pegged, and if
> using thrift protocol NTR would not have values.
>
> Is there an order in which the events you described happened, or is the
> order with which you presented them the order you notice things going
> wrong?
>
> On Thu, Jun 27, 2019 at 1:29 PM Dmitry Simonov 
> wrote:
>
>> Thanks for your reply!
>>
>> > Have you tried increasing concurrent reads until you see more activity
>> in disk?
>> When problem occurs, freshly created 1.2k - 2k Thrift threads consume all
>> CPU on all cores.
>> Does increasing concurrent reads may help in this situation?
>>
>> >
>> org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count
>> This metric is 0 at all cluster nodes.
>>
>> пт, 28 июн. 2019 г. в 00:34, Avinash Mandava :
>>
>>> Have you tried increasing concurrent reads until you see more activity
>>> in disk? If you've always got 32 active reads and high pending reads it
>>> could just be dropping the reads because the queues are saturated. Could be
>>> artificially bottlenecking at the C* process level.
>>>
>>> Also what does this metric show over time:
>>>
>>>
>>> org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count
>>>
>>>
>>>
>>> On Thu, Jun 27, 2019 at 1:52 AM Dmitry Simonov 
>>> wrote:
>>>
>>>> Hello!
>>>>
>>>> We've met several times the following problem.
>>>>
>>>> Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes:
>>>> - all CPUs have 100% load (normally we have LA 5 on 16-cores machine)
>>>> - cassandra's threads count raises from 300 to 1300 - 2000,most of them
>>>> are Thrift threads in java.net.SocketInputStream.socketRead0(Native
>>>> Method) method, count of other threads doesn't increase
>>>> - some Read messages are dropped
>>>> - read latency (p99.9) increases to 20-30 seconds
>>>> - there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks
>>>>
>>>> Problem starts synchronously on all nodes of cluster.
>>>> I cannot tie this problem with increased load from clients ("read rate"
>>>> does't increase during the problem).
>>>> Also looks like there is no problem with disks (I/O latencies are OK).
>>>>
>>>> Could anybody please give some advice in further troubleshooting?
>>>>
>>>> --
>>>> Best Regards,
>>>> Dmitry Simonov
>>>>
>>>
>>>
>>> --
>>> www.vorstella.com
>>> 408 691 8402
>>>
>>
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
>
>
> --
> www.vorstella.com
> 408 691 8402
>


-- 
Best Regards,
Dmitry Simonov


Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
Thanks for your reply!

> Have you tried increasing concurrent reads until you see more activity in
disk?
When problem occurs, freshly created 1.2k - 2k Thrift threads consume all
CPU on all cores.
Does increasing concurrent reads may help in this situation?

>
org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count
This metric is 0 at all cluster nodes.

пт, 28 июн. 2019 г. в 00:34, Avinash Mandava :

> Have you tried increasing concurrent reads until you see more activity in
> disk? If you've always got 32 active reads and high pending reads it could
> just be dropping the reads because the queues are saturated. Could be
> artificially bottlenecking at the C* process level.
>
> Also what does this metric show over time:
>
>
> org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count
>
>
>
> On Thu, Jun 27, 2019 at 1:52 AM Dmitry Simonov 
> wrote:
>
>> Hello!
>>
>> We've met several times the following problem.
>>
>> Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes:
>> - all CPUs have 100% load (normally we have LA 5 on 16-cores machine)
>> - cassandra's threads count raises from 300 to 1300 - 2000,most of them
>> are Thrift threads in java.net.SocketInputStream.socketRead0(Native
>> Method) method, count of other threads doesn't increase
>> - some Read messages are dropped
>> - read latency (p99.9) increases to 20-30 seconds
>> - there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks
>>
>> Problem starts synchronously on all nodes of cluster.
>> I cannot tie this problem with increased load from clients ("read rate"
>> does't increase during the problem).
>> Also looks like there is no problem with disks (I/O latencies are OK).
>>
>> Could anybody please give some advice in further troubleshooting?
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
>
>
> --
> www.vorstella.com
> 408 691 8402
>


-- 
Best Regards,
Dmitry Simonov


Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
Hello!

We've met several times the following problem.

Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes:
- all CPUs have 100% load (normally we have LA 5 on 16-cores machine)
- cassandra's threads count raises from 300 to 1300 - 2000,most of them are
Thrift threads in java.net.SocketInputStream.socketRead0(Native Method)
method, count of other threads doesn't increase
- some Read messages are dropped
- read latency (p99.9) increases to 20-30 seconds
- there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks

Problem starts synchronously on all nodes of cluster.
I cannot tie this problem with increased load from clients ("read rate"
does't increase during the problem).
Also looks like there is no problem with disks (I/O latencies are OK).

Could anybody please give some advice in further troubleshooting?

-- 
Best Regards,
Dmitry Simonov


cqlsh COPY ... TO ... doesn't work if one node down

2018-06-29 Thread Dmitry Simonov
Hello!

I have cassandra cluster with 5 nodes.
There is a (relatively small) keyspace X with RF5.
One node goes down.

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns (effective)  Host
ID   Rack
UN  10.0.0.82   253.64 MB  256  100.0%
839bef9d-79af-422c-a21f-33bdcf4493c1  rack1
UN  10.0.0.154  255.92 MB  256  100.0%
ce23f3a7-67d2-47c0-9ece-7a5dd67c4105  rack1
UN  10.0.0.76   461.26 MB  256  100.0%
c8e18603-0ede-43f0-b713-3ff47ad92323  rack1
UN  10.0.0.94   575.78 MB  256  100.0%
9a324dbc-5ae1-4788-80e4-d86dcaae5a4c  rack1
DN  10.0.0.47   ?  256  100.0%
7b628ca2-4e47-457a-ba42-5191f7e5374b  rack1

I try to export some data using COPY TO, but it fails after long retries.
Why does it fail?
How can I make a copy?
There must be 4 copies of each row on other (alive) replicas.

cqlsh 10.0.0.154 -e "COPY X.Y TO 'backup/X.Y' WITH NUMPROCESSES=1"

Using 1 child processes

Starting copy of X.Y with columns [key, column1, value].
2018-06-29 19:12:23,661 Failed to create connection pool for new host
10.0.0.47:
Traceback (most recent call last):
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
line 2476, in run_add_or_renew_pool
new_pool = HostConnection(host, distance, self)
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/pool.py",
line 332, in __init__
self._connection = session.cluster.connection_factory(host.address)
  File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
line 1205, in connection_factory
return self.connection_class.factory(address, self.connect_timeout,
*args, **kwargs)
  File
"/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py", line
332, in factory
conn = cls(host, *args, **kwargs)
  File
"/usr/lib/foobar/lib/python3.5/site-packages/cassandra/io/asyncorereactor.py",
line 344, in __init__
self._connect_socket()
  File
"/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py", line
371, in _connect_socket
raise socket.error(sockerr.errno, "Tried connecting to %s. Last error:
%s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
OSError: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last
error: timed out
2018-06-29 19:12:23,665 Host 10.0.0.47 has been marked down
2018-06-29 19:12:29,674 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 2.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:36,684 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 4.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:45,696 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 8.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:12:58,716 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 16.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:13:19,756 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 32.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:13:56,834 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 64.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:15:05,887 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 128.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:17:18,982 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 256.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
2018-06-29 19:21:40,064 Error attempting to reconnect to 10.0.0.47,
scheduling retry in 512.0 seconds: [Errno None] Tried connecting to
[('10.0.0.47', 9042)]. Last error: timed out
:1:(4, 'Interrupted system call')
IOError:
IOError:
IOError:
IOError:
IOError:


-- 
Best Regards,
Dmitry Simonov


Re: Network problems during repair make it hang on "Wait for validation to complete"

2018-06-21 Thread Dmitry Simonov
In the previous message, I have pasted source code from cassandra 2.2.8 by
mistake.
Re-checked for 2.2.11 source.
These lines are the same.

2018-06-21 2:49 GMT+05:00 Dmitry Simonov :

> Hello!
>
> Using Cassandra 2.2.11, I observe behaviour, that is very similar to
> https://issues.apache.org/jira/browse/CASSANDRA-12860
>
> Steps to reproduce:
> 1. Set up a cluster: ccm create five -v 2.2.11 && ccm populate -n 5
> --vnodes && ccm start
> 2. Import some keyspace into it (approx 50 Mb of data)
> 3. Start repair on one node: ccm node2 nodetool repair KEYSPACE
> 4. While repair is still running, disconnect node3: sudo iptables -I
> INPUT -p tcp -d 127.0.0.3 -j DROP
> 5. This repair hangs.
> 6. Restore network connectivity
> 7. Repair is still hanging.
> 8. Following repairs will also hang.
>
> In tpstats I see tasks that make no progress:
>
> $ for i in {1..5}; do echo node$i; ccm node$i nodetool tpstats | grep
> "Repair#"; done
> node1
> Repair#1  1  2255  1
> 0 0
> node2
> Repair#1  1  2335 26
> 0 0
> node3
> node4
> Repair#3  1   147   2175
> 0 0
> node5
> Repair#1  1  2335 17
> 0 0
>
> In jconsole I see that Repair threads are blocked here:
>
> Name: Repair#1:1
> State: WAITING on 
> com.google.common.util.concurrent.AbstractFuture$Sync@73c5ab7e
> Total blocked: 0  Total waited: 242
>
> Stack trace:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1371)
> org.apache.cassandra.repair.RepairJob.run(RepairJob.java:167)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>
>
> According to the source code, they are waiting for validations to complete:
>
> # 
> ./apache-cassandra-2.2.8-src/src/java/org/apache/cassandra/repair/RepairJob.java
>  74 public void run()
>  75 {
> ...
> 166 // Wait for validation to complete
> 167 Futures.getUnchecked(validations);
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-11824 says that problem
> was fixed in 2.2.7, but I use 2.2.11.
>
> Restart of all Cassandra nodes that have hanging tasks (one-by-one) allows
> these tasks to disappear from tpstats. After that repairs work well (until
> next network problem).
>
> I also suppose that long GC times on one node (as well as network issues)
> during repair may also lead to the same problem.
>
> Is it a known issue?
>
> --
> Best Regards,
> Dmitry Simonov
>



-- 
Best Regards,
Dmitry Simonov


Network problems during repair make it hang on "Wait for validation to complete"

2018-06-20 Thread Dmitry Simonov
Hello!

Using Cassandra 2.2.11, I observe behaviour, that is very similar to
https://issues.apache.org/jira/browse/CASSANDRA-12860

Steps to reproduce:
1. Set up a cluster: ccm create five -v 2.2.11 && ccm populate -n 5
--vnodes && ccm start
2. Import some keyspace into it (approx 50 Mb of data)
3. Start repair on one node: ccm node2 nodetool repair KEYSPACE
4. While repair is still running, disconnect node3: sudo iptables -I INPUT
-p tcp -d 127.0.0.3 -j DROP
5. This repair hangs.
6. Restore network connectivity
7. Repair is still hanging.
8. Following repairs will also hang.

In tpstats I see tasks that make no progress:

$ for i in {1..5}; do echo node$i; ccm node$i nodetool tpstats | grep
"Repair#"; done
node1
Repair#1  1  2255  1
0 0
node2
Repair#1  1  2335 26
0 0
node3
node4
Repair#3  1   147   2175
0 0
node5
Repair#1  1  2335 17
0 0

In jconsole I see that Repair threads are blocked here:

Name: Repair#1:1
State: WAITING on com.google.common.util.concurrent.AbstractFuture$Sync@73c5ab7e
Total blocked: 0  Total waited: 242

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1371)
org.apache.cassandra.repair.RepairJob.run(RepairJob.java:167)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)


According to the source code, they are waiting for validations to complete:

# 
./apache-cassandra-2.2.8-src/src/java/org/apache/cassandra/repair/RepairJob.java
 74 public void run()
 75 {
...
166 // Wait for validation to complete
167 Futures.getUnchecked(validations);


https://issues.apache.org/jira/browse/CASSANDRA-11824 says that problem was
fixed in 2.2.7, but I use 2.2.11.

Restart of all Cassandra nodes that have hanging tasks (one-by-one) allows
these tasks to disappear from tpstats. After that repairs work well (until
next network problem).

I also suppose that long GC times on one node (as well as network issues)
during repair may also lead to the same problem.

Is it a known issue?

-- 
Best Regards,
Dmitry Simonov


Re: Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Hi, Evelyn!

I've found the following messages:

INFO RepairRunnable.java Starting repair command #41, repairing keyspace
XXX with repair options (parallelism: parallel, primary range: false,
incremental: false, job threads: 1, ColumnFamilies: [YYY], dataCenters: [],
hosts: [], # of ranges: 768)
INFO CompactionExecutor:6 CompactionManager.java Starting anticompaction
for XXX.YYY on 5132/5846 sstables

After that many similar messages go:
SSTable
BigTableReader(path='/mnt/cassandra/data/XXX/YYY-4c12fd9029e611e8810ac73ddacb37d1/lb-12688-big-Data.db')
fully contained in range (-9223372036854775808,-9223372036854775808],
mutating repairedAt instead of anticompacting

Does it means that anti-compaction is not the cause?

2018-04-05 18:01 GMT+05:00 Evelyn Smith <u5015...@gmail.com>:

> It might not be what cause it here. But check your logs for
> anti-compactions.
>
>
> On 5 Apr 2018, at 8:35 pm, Dmitry Simonov <dimmobor...@gmail.com> wrote:
>
> Thank you!
> I'll check this out.
>
> 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski <a...@thelastpickle.com>:
>
>> 40 pending compactions is pretty high and you should have way less than
>> that most of the time, otherwise it means that compaction is not keeping up
>> with your write rate.
>>
>> If you indeed have SSDs for data storage, increase your compaction
>> throughput to 100 or 200 (depending on how the CPUs handle the load). You
>> can experiment with compaction throughput using : nodetool
>> setcompactionthroughput 100
>>
>> You can raise the number of concurrent compactors as well and set it to a
>> value between 4 and 6 if you have at least 8 cores and CPUs aren't
>> overwhelmed.
>>
>> I'm not sure why you ended up with only one node having 6k SSTables and
>> not the others, but you should apply the above changes so that you can
>> lower the number of pending compactions and see if it prevents the issue
>> from happening again.
>>
>> Cheers,
>>
>>
>> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov <dimmobor...@gmail.com>
>> wrote:
>>
>>> Hi, Alexander!
>>>
>>> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
>>> Current compaction throughput is 16 MB/s (default value).
>>>
>>> We always have about 40 pending and 2 active "CompactionExecutor" tasks
>>> in "tpstats".
>>> Mostly because of another (bigger) keyspace in this cluster.
>>> But the situation is the same on each node.
>>>
>>> According to "nodetool compactionhistory", compactions on this CF run
>>> (sometimes several times per day, sometimes one time per day, the last run
>>> was yesterday).
>>> We run "repair -full" regulary for this keyspace (every 24 hours on each
>>> node), because gc_grace_seconds is set to 24 hours.
>>>
>>> Should we consider increasing compaction throughput and
>>> "concurrent_compactors" (as recommended for SSDs) to keep
>>> "CompactionExecutor" pending tasks low?
>>>
>>> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski <a...@thelastpickle.com>
>>> :
>>>
>>>> Hi Dmitry,
>>>>
>>>> could you tell us which compaction strategy that table is currently
>>>> using ?
>>>> Also, what is the compaction max throughput and is auto-compaction
>>>> correctly enabled on that node ?
>>>>
>>>> Did you recently run repair ?
>>>>
>>>> Thanks,
>>>>
>>>> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov <dimmobor...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Could you please give some ideas on the following problem?
>>>>>
>>>>> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>>>>>
>>>>> We've recently discovered high CPU usage on one cluster node, after
>>>>> some investigation we found that number of sstables for one CF on it is
>>>>> very big: 5800 sstables, on other nodes: 3 sstable.
>>>>>
>>>>> Data size in this keyspace was not very big ~100-200Mb per node.
>>>>>
>>>>> There is no such problem with other CFs of that keyspace.
>>>>>
>>>>> nodetool compact solved the issue as a quick-fix.
>>>>>
>>>>> But I'm wondering, what was the cause? How prevent it from repeating?
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Dmitry Simonov
>>>>>
>>>> --
>>>> -
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Dmitry Simonov
>>>
>> --
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Best Regards,
> Dmitry Simonov
>
>
>


-- 
Best Regards,
Dmitry Simonov


Re: Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Thank you!
I'll check this out.

2018-04-05 15:00 GMT+05:00 Alexander Dejanovski <a...@thelastpickle.com>:

> 40 pending compactions is pretty high and you should have way less than
> that most of the time, otherwise it means that compaction is not keeping up
> with your write rate.
>
> If you indeed have SSDs for data storage, increase your compaction
> throughput to 100 or 200 (depending on how the CPUs handle the load). You
> can experiment with compaction throughput using : nodetool
> setcompactionthroughput 100
>
> You can raise the number of concurrent compactors as well and set it to a
> value between 4 and 6 if you have at least 8 cores and CPUs aren't
> overwhelmed.
>
> I'm not sure why you ended up with only one node having 6k SSTables and
> not the others, but you should apply the above changes so that you can
> lower the number of pending compactions and see if it prevents the issue
> from happening again.
>
> Cheers,
>
>
> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov <dimmobor...@gmail.com>
> wrote:
>
>> Hi, Alexander!
>>
>> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
>> Current compaction throughput is 16 MB/s (default value).
>>
>> We always have about 40 pending and 2 active "CompactionExecutor" tasks
>> in "tpstats".
>> Mostly because of another (bigger) keyspace in this cluster.
>> But the situation is the same on each node.
>>
>> According to "nodetool compactionhistory", compactions on this CF run
>> (sometimes several times per day, sometimes one time per day, the last run
>> was yesterday).
>> We run "repair -full" regulary for this keyspace (every 24 hours on each
>> node), because gc_grace_seconds is set to 24 hours.
>>
>> Should we consider increasing compaction throughput and
>> "concurrent_compactors" (as recommended for SSDs) to keep
>> "CompactionExecutor" pending tasks low?
>>
>> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski <a...@thelastpickle.com>:
>>
>>> Hi Dmitry,
>>>
>>> could you tell us which compaction strategy that table is currently
>>> using ?
>>> Also, what is the compaction max throughput and is auto-compaction
>>> correctly enabled on that node ?
>>>
>>> Did you recently run repair ?
>>>
>>> Thanks,
>>>
>>> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov <dimmobor...@gmail.com>
>>> wrote:
>>>
>>>> Hello!
>>>>
>>>> Could you please give some ideas on the following problem?
>>>>
>>>> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>>>>
>>>> We've recently discovered high CPU usage on one cluster node, after
>>>> some investigation we found that number of sstables for one CF on it is
>>>> very big: 5800 sstables, on other nodes: 3 sstable.
>>>>
>>>> Data size in this keyspace was not very big ~100-200Mb per node.
>>>>
>>>> There is no such problem with other CFs of that keyspace.
>>>>
>>>> nodetool compact solved the issue as a quick-fix.
>>>>
>>>> But I'm wondering, what was the cause? How prevent it from repeating?
>>>>
>>>> --
>>>> Best Regards,
>>>> Dmitry Simonov
>>>>
>>> --
>>> -
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best Regards,
Dmitry Simonov


Re: Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Hi, Alexander!

SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
Current compaction throughput is 16 MB/s (default value).

We always have about 40 pending and 2 active "CompactionExecutor" tasks in
"tpstats".
Mostly because of another (bigger) keyspace in this cluster.
But the situation is the same on each node.

According to "nodetool compactionhistory", compactions on this CF run
(sometimes several times per day, sometimes one time per day, the last run
was yesterday).
We run "repair -full" regulary for this keyspace (every 24 hours on each
node), because gc_grace_seconds is set to 24 hours.

Should we consider increasing compaction throughput and
"concurrent_compactors" (as recommended for SSDs) to keep
"CompactionExecutor" pending tasks low?

2018-04-05 14:09 GMT+05:00 Alexander Dejanovski <a...@thelastpickle.com>:

> Hi Dmitry,
>
> could you tell us which compaction strategy that table is currently using ?
> Also, what is the compaction max throughput and is auto-compaction
> correctly enabled on that node ?
>
> Did you recently run repair ?
>
> Thanks,
>
> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov <dimmobor...@gmail.com>
> wrote:
>
>> Hello!
>>
>> Could you please give some ideas on the following problem?
>>
>> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>>
>> We've recently discovered high CPU usage on one cluster node, after some
>> investigation we found that number of sstables for one CF on it is very
>> big: 5800 sstables, on other nodes: 3 sstable.
>>
>> Data size in this keyspace was not very big ~100-200Mb per node.
>>
>> There is no such problem with other CFs of that keyspace.
>>
>> nodetool compact solved the issue as a quick-fix.
>>
>> But I'm wondering, what was the cause? How prevent it from repeating?
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best Regards,
Dmitry Simonov


Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Hello!

Could you please give some ideas on the following problem?

We have a cluster with 3 nodes, running Cassandra 2.2.11.

We've recently discovered high CPU usage on one cluster node, after some
investigation we found that number of sstables for one CF on it is very
big: 5800 sstables, on other nodes: 3 sstable.

Data size in this keyspace was not very big ~100-200Mb per node.

There is no such problem with other CFs of that keyspace.

nodetool compact solved the issue as a quick-fix.

But I'm wondering, what was the cause? How prevent it from repeating?

-- 
Best Regards,
Dmitry Simonov


Re: "READ messages were dropped ... for internal timeout" after big amount of writes

2018-03-19 Thread Dmitry Simonov
Thank you for the recommendation!

Most of pending compactions are for another (~100 times larger) keyspace.
They are always running in the background.

2018-03-16 13:28 GMT+05:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>:

> Hi,
>
> You also have 62 pending compactions at the same time, which is odd for
> such a small dataset IHMO, are you triggering 'nodetool compact' with some
> kind of cron you may have forgot after a test or something else ?
> Do you have any monitoring in place ? If not, you could let some 'dstat
> -tnrvl 10' for a while and look for inconsistency (huge I/O wait at some
> point, blocked proc etc)
>
>
>
>
> On 16 March 2018 at 07:33, Dmitry Simonov <dimmobor...@gmail.com> wrote:
>
>> Hello!
>>
>> We are experiencing problems with Cassandra 2.2.8.
>> There is a cluster with 3 nodes.
>> Problematic keyspace has RF=3 and contains 3 tables (current table sizes:
>> 1Gb, 700Mb, 12Kb).
>>
>> Several times per day there are bursts of "READ messages were dropped ...
>> for internal timeout" messages in logs (on every cassandra node). Duration:
>> 5 - 15 minutes.
>>
>> During periods of drops there is always a queue of pending ReadStage
>> tasks:
>>
>> Pool NameActive   Pending  Completed   Blocked  All 
>> time blocked
>> ReadStage3267 2976548410 0   
>>   0
>> CompactionExecutor262 802136 0   
>>   0
>>
>> Others Active and Pending counters of tpstats are 0.
>>
>> During drops iostat says there is no read requests to disks, probably
>> because all data fits in a disk cache:
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   56,530,94   39,840,010,002,68
>>
>> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sda   0,0011,000,00   26,00 0,00 9,09   715,92   
>>   0,78   30,310,00   30,31   2,46   6,40
>> sdb   0,0011,000,00   33,00 0,0010,57   655,70   
>>   0,83   26,000,00   26,00   2,00   6,60
>> sdc   0,00 1,000,00   30,50 0,0010,98   737,07   
>>   0,91   30,490,00   30,49   2,10   6,40
>> sdd   0,0031,500,00   35,00 0,0011,17   653,50   
>>   0,98   28,170,00   28,17   1,83   6,40
>> sde   0,0031,500,00   34,50 0,0010,82   642,10   
>>   0,67   19,540,00   19,54   1,39   4,80
>> sdf   0,00 1,000,00   24,50 0,00 9,71   811,78   
>>   0,60   24,330,00   24,33   1,88   4,60
>> sdg   0,00 1,000,00   23,00 0,00 8,93   795,15   
>>   0,51   22,260,00   22,26   1,91   4,40
>> sdh   0,00 1,000,00   21,50 0,00 8,37   797,05   
>>   0,45   21,020,00   21,02   1,86   4,00
>>
>> Disks are SSDs.
>>
>> Before that drops "Local write count" for problematic table increases
>> very fast (10k-30k/sec, while ordinary write rate is 10-30/sec) during 1
>> minute. After that drops start.
>>
>> Tried useding probabilistic tracing to determine which requests cause
>> "write count" to increase, but see no "batch_mutate" queries at all, only
>> reads!
>>
>> There are no GC warnings about long pauses
>>
>> Could you please help troubleshooting the issue?
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
>
>


-- 
Best Regards,
Dmitry Simonov


"READ messages were dropped ... for internal timeout" after big amount of writes

2018-03-16 Thread Dmitry Simonov
Hello!

We are experiencing problems with Cassandra 2.2.8.
There is a cluster with 3 nodes.
Problematic keyspace has RF=3 and contains 3 tables (current table sizes:
1Gb, 700Mb, 12Kb).

Several times per day there are bursts of "READ messages were dropped ...
for internal timeout" messages in logs (on every cassandra node). Duration:
5 - 15 minutes.

During periods of drops there is always a queue of pending ReadStage tasks:

Pool NameActive   Pending  Completed   Blocked
 All time blocked
ReadStage3267 2976548410 0
0
CompactionExecutor262 802136 0
0

Others Active and Pending counters of tpstats are 0.

During drops iostat says there is no read requests to disks, probably
because all data fits in a disk cache:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  56,530,94   39,840,010,002,68

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0,0011,000,00   26,00 0,00 9,09
715,92 0,78   30,310,00   30,31   2,46   6,40
sdb   0,0011,000,00   33,00 0,0010,57
655,70 0,83   26,000,00   26,00   2,00   6,60
sdc   0,00 1,000,00   30,50 0,0010,98
737,07 0,91   30,490,00   30,49   2,10   6,40
sdd   0,0031,500,00   35,00 0,0011,17
653,50 0,98   28,170,00   28,17   1,83   6,40
sde   0,0031,500,00   34,50 0,0010,82
642,10 0,67   19,540,00   19,54   1,39   4,80
sdf   0,00 1,000,00   24,50 0,00 9,71
811,78 0,60   24,330,00   24,33   1,88   4,60
sdg   0,00 1,000,00   23,00 0,00 8,93
795,15 0,51   22,260,00   22,26   1,91   4,40
sdh   0,00 1,000,00   21,50 0,00 8,37
797,05 0,45   21,020,00   21,02   1,86   4,00

Disks are SSDs.

Before that drops "Local write count" for problematic table increases very
fast (10k-30k/sec, while ordinary write rate is 10-30/sec) during 1 minute.
After that drops start.

Tried useding probabilistic tracing to determine which requests cause
"write count" to increase, but see no "batch_mutate" queries at all, only
reads!

There are no GC warnings about long pauses

Could you please help troubleshooting the issue?

-- 
Best Regards,
Dmitry Simonov