Le 17 janv. 2013 à 05:00, aaron morton <aa...@thelastpickle.com> a écrit :
> Check the disk utilisation using iostat -x 5 > If you are on a VM / in the cloud check for CPU steal. > Check the logs for messages from the GCInspector, the ParNew events are times > the JVM is paused. I have seen logs about that. I didn't worry much, since the GC of the jvm was not under pressure. As far as I understand, unless a CF is "continuously" flushed, it should not be a major issue, isn't it ? I don't know for sure if there was a lot of flush though, since my nodes were not properly monitored. > Look at the times dropped messages are logged and try to correlate them with > other server events. I tried that with not much success. I have graphs on cacti though, so this is quite hard to visualize when things happen simultaneously on several graphs. > If you have a lot secondary indexes, or a lot of memtables flushing at the > some time you may be blocking behind the global Switch Lock. If you use > secondary indexes make sure the memtable_flush_queue_size is set correctly, > see the comments in the yaml file. I have no secondary indexes. > If you have a lot of CF's flushing at the same time, and there are not > messages from the "MeteredFlusher", it may be the log segment is too big for > the number of CF's you have. When the segment needs to be recycled all dirty > CF's are flushed, if you have a lot of cf's this can result in blocking > around the switch lock. Trying reducing the commitlog_segment_size_in_mb so > that less CF's are flushed. What is "a lot" ? We have 26 CF. 9 are barely used. 15 contains time series data (cassandra rocks with them) in which only 3 of them have from 1 to 10 read or writes per sec. 1 quite hot (200read/s) which is mainly used for its bloom filter (which "disksize" is about 1G). And 1 also hot used only for writes (which has the same big bloom filter, which I am about to remove since it is useless). BTW, thanks for the pointers. I have not tried yet to put our nodes under pressure. But when I'll do, I'll look at those pointers closely. Nicolas > > Hope that helps > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 17/01/2013, at 10:30 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> > wrote: > >> Hi, >> >> I have a strange behavior I am not able to understand. >> >> I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a >> replication factor of 3. >> >> --------------- >> my story is maybe too long, trying shorter here, while saving what I wrote >> in case someone has patience to read my bad english ;) >> >> I got under a situation where my cluster was generating a lot of timeouts on >> our frontend, whereas I could not see any major trouble on the internal >> stats. Actually cpu, read & write counts on the column families were quite >> low. A mess until I switched from java7 to java6 and forced the used of >> jamm. After the switch, cpu, read & write counts, were going up again, >> timeouts gone. I have seen this behavior while reducing the xmx too. >> >> What could be blocking cassandra from utilizing the while resources of the >> machine ? Is there is metrics I didn't saw which could explain this ? >> >> --------------- >> Here is the long story. >> >> When I first set my cluster up, I gave blindly 6G of heap to the cassandra >> nodes, thinking that more a java process has, the smoother it runs, while >> keeping some RAM to the disk cache. We got some new feature deployed, and >> things were going into hell, some machine up to 60% of wa. I give credit to >> cassandra because there was not that much timeout received on the web >> frontend, it was kind of slow but is was kind of working. With some >> optimizations, we reduced the pressure of the new feature, but it was still >> at 40%wa. >> >> At that time I didn't have much monitoring, just heap and cpu. I read some >> article how to tune, and I learned that the disk cache is quite important >> because cassandra relies on it to be the read cache. So I have tried many >> xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have >> set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, >> I changed the xmx 3,3G on each node. But then things really went to hell, a >> lot of timeouts on the frontend. It was not working at all. So I rolled back. >> >> After some time, probably because of the growing data of the new feature to >> a nominal size, things went again to very high %wa, and cassandra was not >> able to keep it up. So we kind of reverted the feature, the column family is >> still used but only by one thread on the frontend. The wa was reduced to >> 20%, but things continued to not properly working, from time to time, a >> bunch of timeout are raised on our frontend. >> >> In the mean time, I took time to do some proper monitoring of cassandra: >> column family read & write counts, latency, memtable size, but also the >> dropped messages, the pending tasks, the timeouts between nodes. It's just a >> start but it haves me a first nice view of what is actually going on. >> >> I tried again reducing the xmx on one node. Cassandra is not complaining of >> having not enough heap, memtables are not flushed insanely every second, the >> number of read and write is reduced compared to the other node, the cpu is >> lower too, there is not much pending tasks, no message dropped more than 1 >> or 2 from time to time. Everything indicates that there is probably more >> room to more work, but the node doesn't take it. Even its read and write >> latencies are lower than on the other nodes. But if I keep this long enough >> with this xmx, timeouts start to raise on the frontends. >> After some individual node experiment, the cluster was starting be be quite >> "sick". Even with 6G, the %wa were reducing, read and write counts too, on >> kind of every node. And more and more timeout raised on the frontend. >> The only thing that I could see worrying, is the heap climbing slowly above >> the 75% threshold and from time to time suddenly dropping from 95% to 70%. I >> looked at the full gc counter, not much pressure. >> And another thing was some "Timed out replaying hints to /10.0.0.56; >> aborting further deliveries" in the log. But logged as info, so I guess not >> much important. >> >> After some long useless staring at the monitoring graphs, I gave a try to >> using the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load >> jamm, since in 1.0 the init script blacklist the openjdk. Node after node, I >> saw that the heap was behaving more like I use to see on jam based apps, >> some nice up and down rather than a long and slow climb. But read and write >> counts were still low on every node, and timeout were still bursting on our >> frontend. >> A continuing mess until I restarted the "first" node of the cluster. There >> was still one to switch to java6 + jamm, but as soon as I restarted my >> "first" node, every node started working more, %wa climbing, read & write >> count climbing, no more timeout on the frontend, the frontend being then >> fast has hell. >> >> I understand that my cluster is probably under-capacity. But I don't >> understand how since there is something within cassandra which might block >> the full use of the machine resources. It seems kind of related to the heap, >> but I don't know how. Any idea ? >> I intend to start monitoring more metrics, but do you have any hint on which >> could explain that behavior ? >> >> Nicolas >> >