Re: Cassandra timeout whereas it is not much busy

aaron morton Tue, 22 Jan 2013 12:24:38 -0800

> I have seen logs about that. I didn't worry much, since the GC of the jvm was 
> not under pressure. 
When cassandra logs a ParNew event from the GCInspector that is time the server 
is paused / frozen. CMS events have a very small pause, but they are taking a 
non trivial amount of CPU time.


If you are logging a log of GC events you should look into it. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/01/2013, at 3:28 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote:

> Le 17 janv. 2013 à 05:00, aaron morton <aa...@thelastpickle.com> a écrit :
> 
>> Check the disk utilisation using iostat -x 5
>> If you are on a VM / in the cloud check for CPU steal. 
>> Check the logs for messages from the GCInspector, the ParNew events are 
>> times the JVM is paused. 
> 
> I have seen logs about that. I didn't worry much, since the GC of the jvm was 
> not under pressure. As far as I understand, unless a CF is "continuously" 
> flushed, it should not be a major issue, isn't it ?
> I don't know for sure if there was a lot of flush though, since my nodes were 
> not properly monitored.
> 
>> Look at the times dropped messages are logged and try to correlate them with 
>> other server events.
> 
> I tried that with not much success. I have graphs on cacti though, so this is 
> quite hard to visualize when things happen simultaneously on several graphs.
> 
>> If you have a lot secondary indexes, or a lot of memtables flushing at the 
>> some time you may be blocking behind the global Switch Lock. If you use 
>> secondary indexes make sure the memtable_flush_queue_size is set correctly, 
>> see the comments in the yaml file.
> 
> I have no secondary indexes.
> 
>> If you have a lot of CF's flushing at the same time, and there are not 
>> messages from the "MeteredFlusher", it may be the log segment is too big for 
>> the number of CF's you have. When the segment needs to be recycled all dirty 
>> CF's are flushed, if you have a lot of cf's this can result in blocking 
>> around the switch lock. Trying reducing the commitlog_segment_size_in_mb so 
>> that less CF's are flushed.
> 
> What is "a lot" ? We have 26 CF. 9 are barely used. 15 contains time series 
> data (cassandra rocks with them) in which only 3 of them have from 1 to 10 
> read or writes per sec. 1 quite hot (200read/s) which is mainly used for its 
> bloom filter (which "disksize" is about 1G). And 1 also hot used only for 
> writes (which has the same big bloom filter, which I am about to remove since 
> it is useless).
> 
> BTW, thanks for the pointers. I have not tried yet to put our nodes under 
> pressure. But when I'll do, I'll look at those pointers closely.
> 
> Nicolas
> 
>> 
>> Hope that helps
>>  
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 17/01/2013, at 10:30 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I have a strange behavior I am not able to understand.
>>> 
>>> I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
>>> replication factor of 3.
>>> 
>>> ---------------
>>> my story is maybe too long, trying shorter here, while saving what I wrote 
>>> in case someone has patience to read my bad english ;)
>>> 
>>> I got under a situation where my cluster was generating a lot of timeouts 
>>> on our frontend, whereas I could not see any major trouble on the internal 
>>> stats. Actually cpu, read & write counts on the column families were quite 
>>> low. A mess until I switched from java7 to java6 and forced the used of 
>>> jamm. After the switch, cpu, read & write counts, were going up again, 
>>> timeouts gone. I have seen this behavior while reducing the xmx too.
>>> 
>>> What could be blocking cassandra from utilizing the while resources of the 
>>> machine ? Is there is metrics I didn't saw which could explain this ?
>>> 
>>> ---------------
>>> Here is the long story.
>>> 
>>> When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
>>> nodes, thinking that more a java process has, the smoother it runs, while 
>>> keeping some RAM to the disk cache. We got some new feature deployed, and 
>>> things were going into hell, some machine up to 60% of wa. I give credit to 
>>> cassandra because there was not that much timeout received on the web 
>>> frontend, it was kind of slow but is was kind of working. With some 
>>> optimizations, we reduced the pressure of the new feature, but it was still 
>>> at 40%wa.
>>> 
>>> At that time I didn't have much monitoring, just heap and cpu. I read some 
>>> article how to tune, and I learned that the disk cache is quite important 
>>> because cassandra relies on it to be the read cache. So I have tried many 
>>> xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I 
>>> have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with 
>>> that, I changed the xmx 3,3G on each node. But then things really went to 
>>> hell, a lot of timeouts on the frontend. It was not working at all. So I 
>>> rolled back.
>>> 
>>> After some time, probably because of the growing data of the new feature to 
>>> a nominal size, things went again to very high %wa, and cassandra was not 
>>> able to keep it up. So we kind of reverted the feature, the column family 
>>> is still used but only by one thread on the frontend. The wa was reduced to 
>>> 20%, but things continued to not properly working, from time to time, a 
>>> bunch of timeout are raised on our frontend.
>>> 
>>> In the mean time, I took time to do some proper monitoring of cassandra: 
>>> column family read & write counts, latency, memtable size, but also the 
>>> dropped messages, the pending tasks, the timeouts between nodes. It's just 
>>> a start but it haves me a first nice view of what is actually going on.
>>> 
>>> I tried again reducing the xmx on one node. Cassandra is not complaining of 
>>> having not enough heap, memtables are not flushed insanely every second, 
>>> the number of read and write is reduced compared to the other node, the cpu 
>>> is lower too, there is not much pending tasks, no message dropped more than 
>>> 1 or 2 from time to time. Everything indicates that there is probably more 
>>> room to more work, but the node doesn't take it. Even its read and write 
>>> latencies are lower than on the other nodes. But if I keep this long enough 
>>> with this xmx, timeouts start to raise on the frontends.
>>> After some individual node experiment, the cluster was starting be be quite 
>>> "sick". Even with 6G, the %wa were reducing, read and write counts too, on 
>>> kind of every node. And more and more timeout raised on the frontend.
>>> The only thing that I could see worrying, is the heap climbing slowly above 
>>> the 75% threshold and from time to time suddenly dropping from 95% to 70%. 
>>> I looked at the full gc counter, not much pressure.
>>> And another thing was some "Timed out replaying hints to /10.0.0.56; 
>>> aborting further deliveries" in the log. But logged as info, so I guess not 
>>> much important.
>>> 
>>> After some long useless staring at the monitoring graphs, I gave a try to 
>>> using the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load 
>>> jamm, since in 1.0 the init script blacklist the openjdk. Node after node, 
>>> I saw that the heap was behaving more like I use to see on jam based apps, 
>>> some nice up and down rather than a long and slow climb. But read and write 
>>> counts were still low on every node, and timeout were still bursting on our 
>>> frontend.
>>> A continuing mess until I restarted the "first" node of the cluster. There 
>>> was still one to switch to java6 + jamm, but as soon as I restarted my 
>>> "first" node, every node started working more, %wa climbing, read & write 
>>> count climbing, no more timeout on the frontend, the frontend being then 
>>> fast has hell.
>>> 
>>> I understand that my cluster is probably under-capacity. But I don't 
>>> understand how since there is something within cassandra which might block 
>>> the full use of the machine resources. It seems kind of related to the 
>>> heap, but I don't know how. Any idea ?
>>> I intend to start monitoring more metrics, but do you have any hint on 
>>> which could explain that behavior ?
>>> 
>>> Nicolas
>>> 
>> 
>

Re: Cassandra timeout whereas it is not much busy

Reply via email to