Le 17 janv. 2013 à 05:00, aaron morton <aa...@thelastpickle.com> a écrit :

> Check the disk utilisation using iostat -x 5
> If you are on a VM / in the cloud check for CPU steal. 
> Check the logs for messages from the GCInspector, the ParNew events are times 
> the JVM is paused. 

I have seen logs about that. I didn't worry much, since the GC of the jvm was 
not under pressure. As far as I understand, unless a CF is "continuously" 
flushed, it should not be a major issue, isn't it ?
I don't know for sure if there was a lot of flush though, since my nodes were 
not properly monitored.

> Look at the times dropped messages are logged and try to correlate them with 
> other server events.

I tried that with not much success. I have graphs on cacti though, so this is 
quite hard to visualize when things happen simultaneously on several graphs.

> If you have a lot secondary indexes, or a lot of memtables flushing at the 
> some time you may be blocking behind the global Switch Lock. If you use 
> secondary indexes make sure the memtable_flush_queue_size is set correctly, 
> see the comments in the yaml file.

I have no secondary indexes.

> If you have a lot of CF's flushing at the same time, and there are not 
> messages from the "MeteredFlusher", it may be the log segment is too big for 
> the number of CF's you have. When the segment needs to be recycled all dirty 
> CF's are flushed, if you have a lot of cf's this can result in blocking 
> around the switch lock. Trying reducing the commitlog_segment_size_in_mb so 
> that less CF's are flushed.

What is "a lot" ? We have 26 CF. 9 are barely used. 15 contains time series 
data (cassandra rocks with them) in which only 3 of them have from 1 to 10 read 
or writes per sec. 1 quite hot (200read/s) which is mainly used for its bloom 
filter (which "disksize" is about 1G). And 1 also hot used only for writes 
(which has the same big bloom filter, which I am about to remove since it is 
useless).

BTW, thanks for the pointers. I have not tried yet to put our nodes under 
pressure. But when I'll do, I'll look at those pointers closely.

Nicolas

> 
> Hope that helps
>  
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 17/01/2013, at 10:30 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> 
> wrote:
> 
>> Hi,
>> 
>> I have a strange behavior I am not able to understand.
>> 
>> I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
>> replication factor of 3.
>> 
>> ---------------
>> my story is maybe too long, trying shorter here, while saving what I wrote 
>> in case someone has patience to read my bad english ;)
>> 
>> I got under a situation where my cluster was generating a lot of timeouts on 
>> our frontend, whereas I could not see any major trouble on the internal 
>> stats. Actually cpu, read & write counts on the column families were quite 
>> low. A mess until I switched from java7 to java6 and forced the used of 
>> jamm. After the switch, cpu, read & write counts, were going up again, 
>> timeouts gone. I have seen this behavior while reducing the xmx too.
>> 
>> What could be blocking cassandra from utilizing the while resources of the 
>> machine ? Is there is metrics I didn't saw which could explain this ?
>> 
>> ---------------
>> Here is the long story.
>> 
>> When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
>> nodes, thinking that more a java process has, the smoother it runs, while 
>> keeping some RAM to the disk cache. We got some new feature deployed, and 
>> things were going into hell, some machine up to 60% of wa. I give credit to 
>> cassandra because there was not that much timeout received on the web 
>> frontend, it was kind of slow but is was kind of working. With some 
>> optimizations, we reduced the pressure of the new feature, but it was still 
>> at 40%wa.
>> 
>> At that time I didn't have much monitoring, just heap and cpu. I read some 
>> article how to tune, and I learned that the disk cache is quite important 
>> because cassandra relies on it to be the read cache. So I have tried many 
>> xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have 
>> set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, 
>> I changed the xmx 3,3G on each node. But then things really went to hell, a 
>> lot of timeouts on the frontend. It was not working at all. So I rolled back.
>> 
>> After some time, probably because of the growing data of the new feature to 
>> a nominal size, things went again to very high %wa, and cassandra was not 
>> able to keep it up. So we kind of reverted the feature, the column family is 
>> still used but only by one thread on the frontend. The wa was reduced to 
>> 20%, but things continued to not properly working, from time to time, a 
>> bunch of timeout are raised on our frontend.
>> 
>> In the mean time, I took time to do some proper monitoring of cassandra: 
>> column family read & write counts, latency, memtable size, but also the 
>> dropped messages, the pending tasks, the timeouts between nodes. It's just a 
>> start but it haves me a first nice view of what is actually going on.
>> 
>> I tried again reducing the xmx on one node. Cassandra is not complaining of 
>> having not enough heap, memtables are not flushed insanely every second, the 
>> number of read and write is reduced compared to the other node, the cpu is 
>> lower too, there is not much pending tasks, no message dropped more than 1 
>> or 2 from time to time. Everything indicates that there is probably more 
>> room to more work, but the node doesn't take it. Even its read and write 
>> latencies are lower than on the other nodes. But if I keep this long enough 
>> with this xmx, timeouts start to raise on the frontends.
>> After some individual node experiment, the cluster was starting be be quite 
>> "sick". Even with 6G, the %wa were reducing, read and write counts too, on 
>> kind of every node. And more and more timeout raised on the frontend.
>> The only thing that I could see worrying, is the heap climbing slowly above 
>> the 75% threshold and from time to time suddenly dropping from 95% to 70%. I 
>> looked at the full gc counter, not much pressure.
>> And another thing was some "Timed out replaying hints to /10.0.0.56; 
>> aborting further deliveries" in the log. But logged as info, so I guess not 
>> much important.
>> 
>> After some long useless staring at the monitoring graphs, I gave a try to 
>> using the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load 
>> jamm, since in 1.0 the init script blacklist the openjdk. Node after node, I 
>> saw that the heap was behaving more like I use to see on jam based apps, 
>> some nice up and down rather than a long and slow climb. But read and write 
>> counts were still low on every node, and timeout were still bursting on our 
>> frontend.
>> A continuing mess until I restarted the "first" node of the cluster. There 
>> was still one to switch to java6 + jamm, but as soon as I restarted my 
>> "first" node, every node started working more, %wa climbing, read & write 
>> count climbing, no more timeout on the frontend, the frontend being then 
>> fast has hell.
>> 
>> I understand that my cluster is probably under-capacity. But I don't 
>> understand how since there is something within cassandra which might block 
>> the full use of the machine resources. It seems kind of related to the heap, 
>> but I don't know how. Any idea ?
>> I intend to start monitoring more metrics, but do you have any hint on which 
>> could explain that behavior ?
>> 
>> Nicolas
>> 
> 

Reply via email to