Yes. I don't think this was in the beta2 release notes, but it will be in for 0.5 final: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5/NEWS.txt
On Thu, Dec 17, 2009 at 6:43 PM, Ramzi Rabah <[email protected]> wrote: > Ok I believe the problem is when I was upgrading to a newer build of > cassandra, I was upgrading the servers one by one by restarting them. > So at one point of time I had some nodes that were 2 days older than > the others, and it seems to have caused the inter-node messaging to go > haywire. > > I stopped all the nodes at the same time, and restarted all of them, > and seems like the problem is fixed. > Cheers > Ramzi > > > On Thu, Dec 17, 2009 at 8:55 AM, Ramzi Rabah <[email protected]> wrote: >> I added some debugging code to capture the time a read takes >> (getColumnFamily) and the time the road trip weakRemoteRead takes. >> The time it takes to read columns is negligible, so it doesn't seem a >> problem with getColumnFamily. The time it takes for weakRemoteRead >> however is > 5 seconds in some cases. So looking at some more >> debugging output, >> the log indicates that the packets are in the process of being sent by >> weakRemoteRead to the correct target node, but for some reason, the >> target node does not have any reference >> in the log that it handled the get at all. >> >> Couple other things to note: >> 1- I restarted the nodes one after another, while there was traffic >> going to them. Don't know if that will throw off cassandra or that the >> whole thing is a network congestion problem? >> 2- Read stats on the keyspace level indicate NaN value for Read >> latency which seems like a bug? >> >> Thanks >> Ramzi >> >> On Wed, Dec 16, 2009 at 12:07 PM, Jonathan Ellis <[email protected]> wrote: >>> On Wed, Dec 16, 2009 at 12:46 PM, Ramzi Rabah <[email protected]> wrote: >>>> We are observing increasing number of TimedOutExceptions in cassandra >>>> 0.5 trunk although the load seems fairly low (about 400 reads/writes >>>> per second). >>>> cfstats reports that operations are taking less than 2 ms on average. >>>> >>>> 2 Things I have noticed looking at the source code. >>>> >>>> 1- TimedOutExceptions are silently swallowed by Cassandra and not >>>> reported in the logs even at debug level >>> >>> It's reported to the client. Hardly "swallowed" :) >>> >>>> 2- readstats does not account for these long time running queries that >>>> time out. >>> >>> Right. But the CF-level stats do. >>> >>>> I'm wondering, what could be causing the system to go haywire like >>>> this? >>> >>> Hard to say without more information. One shot in the dark is that >>> get_key_range is a major offender sometimes, as well as workloads that >>> do lots of deletes + re-inserts for the same keys. >>> >>> -Jonathan >>> >> >
