> I have seen logs about that. I didn't worry much, since the GC of the jvm was > not under pressure. When cassandra logs a ParNew event from the GCInspector that is time the server is paused / frozen. CMS events have a very small pause, but they are taking a non trivial amount of CPU time.
If you are logging a log of GC events you should look into it. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/01/2013, at 3:28 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote: > Le 17 janv. 2013 à 05:00, aaron morton <aa...@thelastpickle.com> a écrit : > >> Check the disk utilisation using iostat -x 5 >> If you are on a VM / in the cloud check for CPU steal. >> Check the logs for messages from the GCInspector, the ParNew events are >> times the JVM is paused. > > I have seen logs about that. I didn't worry much, since the GC of the jvm was > not under pressure. As far as I understand, unless a CF is "continuously" > flushed, it should not be a major issue, isn't it ? > I don't know for sure if there was a lot of flush though, since my nodes were > not properly monitored. > >> Look at the times dropped messages are logged and try to correlate them with >> other server events. > > I tried that with not much success. I have graphs on cacti though, so this is > quite hard to visualize when things happen simultaneously on several graphs. > >> If you have a lot secondary indexes, or a lot of memtables flushing at the >> some time you may be blocking behind the global Switch Lock. If you use >> secondary indexes make sure the memtable_flush_queue_size is set correctly, >> see the comments in the yaml file. > > I have no secondary indexes. > >> If you have a lot of CF's flushing at the same time, and there are not >> messages from the "MeteredFlusher", it may be the log segment is too big for >> the number of CF's you have. When the segment needs to be recycled all dirty >> CF's are flushed, if you have a lot of cf's this can result in blocking >> around the switch lock. Trying reducing the commitlog_segment_size_in_mb so >> that less CF's are flushed. > > What is "a lot" ? We have 26 CF. 9 are barely used. 15 contains time series > data (cassandra rocks with them) in which only 3 of them have from 1 to 10 > read or writes per sec. 1 quite hot (200read/s) which is mainly used for its > bloom filter (which "disksize" is about 1G). And 1 also hot used only for > writes (which has the same big bloom filter, which I am about to remove since > it is useless). > > BTW, thanks for the pointers. I have not tried yet to put our nodes under > pressure. But when I'll do, I'll look at those pointers closely. > > Nicolas > >> >> Hope that helps >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 17/01/2013, at 10:30 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> >> wrote: >> >>> Hi, >>> >>> I have a strange behavior I am not able to understand. >>> >>> I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a >>> replication factor of 3. >>> >>> --------------- >>> my story is maybe too long, trying shorter here, while saving what I wrote >>> in case someone has patience to read my bad english ;) >>> >>> I got under a situation where my cluster was generating a lot of timeouts >>> on our frontend, whereas I could not see any major trouble on the internal >>> stats. Actually cpu, read & write counts on the column families were quite >>> low. A mess until I switched from java7 to java6 and forced the used of >>> jamm. After the switch, cpu, read & write counts, were going up again, >>> timeouts gone. I have seen this behavior while reducing the xmx too. >>> >>> What could be blocking cassandra from utilizing the while resources of the >>> machine ? Is there is metrics I didn't saw which could explain this ? >>> >>> --------------- >>> Here is the long story. >>> >>> When I first set my cluster up, I gave blindly 6G of heap to the cassandra >>> nodes, thinking that more a java process has, the smoother it runs, while >>> keeping some RAM to the disk cache. We got some new feature deployed, and >>> things were going into hell, some machine up to 60% of wa. I give credit to >>> cassandra because there was not that much timeout received on the web >>> frontend, it was kind of slow but is was kind of working. With some >>> optimizations, we reduced the pressure of the new feature, but it was still >>> at 40%wa. >>> >>> At that time I didn't have much monitoring, just heap and cpu. I read some >>> article how to tune, and I learned that the disk cache is quite important >>> because cassandra relies on it to be the read cache. So I have tried many >>> xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I >>> have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with >>> that, I changed the xmx 3,3G on each node. But then things really went to >>> hell, a lot of timeouts on the frontend. It was not working at all. So I >>> rolled back. >>> >>> After some time, probably because of the growing data of the new feature to >>> a nominal size, things went again to very high %wa, and cassandra was not >>> able to keep it up. So we kind of reverted the feature, the column family >>> is still used but only by one thread on the frontend. The wa was reduced to >>> 20%, but things continued to not properly working, from time to time, a >>> bunch of timeout are raised on our frontend. >>> >>> In the mean time, I took time to do some proper monitoring of cassandra: >>> column family read & write counts, latency, memtable size, but also the >>> dropped messages, the pending tasks, the timeouts between nodes. It's just >>> a start but it haves me a first nice view of what is actually going on. >>> >>> I tried again reducing the xmx on one node. Cassandra is not complaining of >>> having not enough heap, memtables are not flushed insanely every second, >>> the number of read and write is reduced compared to the other node, the cpu >>> is lower too, there is not much pending tasks, no message dropped more than >>> 1 or 2 from time to time. Everything indicates that there is probably more >>> room to more work, but the node doesn't take it. Even its read and write >>> latencies are lower than on the other nodes. But if I keep this long enough >>> with this xmx, timeouts start to raise on the frontends. >>> After some individual node experiment, the cluster was starting be be quite >>> "sick". Even with 6G, the %wa were reducing, read and write counts too, on >>> kind of every node. And more and more timeout raised on the frontend. >>> The only thing that I could see worrying, is the heap climbing slowly above >>> the 75% threshold and from time to time suddenly dropping from 95% to 70%. >>> I looked at the full gc counter, not much pressure. >>> And another thing was some "Timed out replaying hints to /10.0.0.56; >>> aborting further deliveries" in the log. But logged as info, so I guess not >>> much important. >>> >>> After some long useless staring at the monitoring graphs, I gave a try to >>> using the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load >>> jamm, since in 1.0 the init script blacklist the openjdk. Node after node, >>> I saw that the heap was behaving more like I use to see on jam based apps, >>> some nice up and down rather than a long and slow climb. But read and write >>> counts were still low on every node, and timeout were still bursting on our >>> frontend. >>> A continuing mess until I restarted the "first" node of the cluster. There >>> was still one to switch to java6 + jamm, but as soon as I restarted my >>> "first" node, every node started working more, %wa climbing, read & write >>> count climbing, no more timeout on the frontend, the frontend being then >>> fast has hell. >>> >>> I understand that my cluster is probably under-capacity. But I don't >>> understand how since there is something within cassandra which might block >>> the full use of the machine resources. It seems kind of related to the >>> heap, but I don't know how. Any idea ? >>> I intend to start monitoring more metrics, but do you have any hint on >>> which could explain that behavior ? >>> >>> Nicolas >>> >> >