Re: TimedOutException

Ramzi Rabah Thu, 17 Dec 2009 08:55:40 -0800

I added some debugging code to capture the time a read takes
(getColumnFamily) and the time the road trip weakRemoteRead takes.
The time it takes to read columns is negligible, so it doesn't seem a
problem with getColumnFamily. The time it takes for weakRemoteRead
however is > 5 seconds in some cases. So looking at some more
debugging output,
the log indicates that the packets are in the process of being sent by
weakRemoteRead to the correct target node, but for some reason, the
target node does not have any reference
in the log that it handled the get at all.


Couple other things to note:
1- I restarted the nodes one after another, while there was traffic
going to them. Don't know if that will throw off cassandra or that the
whole thing is a network congestion problem?
2- Read stats on the keyspace level indicate NaN value for Read
latency which seems like a bug?

Thanks
Ramzi

On Wed, Dec 16, 2009 at 12:07 PM, Jonathan Ellis <[email protected]> wrote:
> On Wed, Dec 16, 2009 at 12:46 PM, Ramzi Rabah <[email protected]> wrote:
>> We are observing increasing number of TimedOutExceptions in cassandra
>> 0.5 trunk although the load seems fairly low (about 400 reads/writes
>> per second).
>> cfstats reports that operations are taking less than 2 ms on average.
>>
>> 2 Things I have noticed looking at the source code.
>>
>> 1- TimedOutExceptions are silently swallowed by Cassandra and not
>> reported in the logs even at debug level
>
> It's reported to the client.  Hardly "swallowed" :)
>
>> 2- readstats does not account for these long time running queries that
>> time out.
>
> Right.  But the CF-level stats do.
>
>> I'm wondering, what could be causing the system to go haywire like
>> this?
>
> Hard to say without more information.  One shot in the dark is that
> get_key_range is a major offender sometimes, as well as workloads that
> do lots of deletes + re-inserts for the same keys.
>
> -Jonathan
>

Re: TimedOutException

Reply via email to