i would expect read latency to increase linearly w/ the number of sstables you have around. how many are in your data directories? is your compaction lagging 1000s of tables behind again?
On Thu, Dec 3, 2009 at 12:58 PM, Freeman, Tim <[email protected]> wrote: > I ran another test last night with the build dated 29 Nov 2009. Other than > the Cassandra version, the setup was the same as before. I got qualitatively > similar results as before, too -- the read latency increased fairly smoothly > from 250ms to 1s, the GC times reported by jconsole are low, the pending > tasks for row-mutation-stage and row-read-stage are less than 10, the pending > tasks for the compaction pool are 1615. Last time around the read latency > maxed out at one second. This time, it just got to one second as I'm writing > this so I don't know yet if it will continue to increase. > > I have attached a fresh graph describing the present run. It's qualitatively > similar to the previous one. The vertical units are milliseconds (for > latency) and operations per minute (for reads or writes). The horizontal > scale is seconds. The feature that's bothering me is the red line for the > read latency going diagonally from lower left to the lower-middle right. The > scale doesn't make it look dramatic, but Cassandra slowed down by a factor of > 4. > > The read and write rates were stable for 45,000 seconds or so, and then the > read latency got big enough that the application was starved for reads and it > started writing less. > > If this is worth pursuing, I suppose the next step would be for me to make a > small program that reproduces the problem. It should be easy -- we're just > reading and writing random records. Let me know if there's interest in that. > I could also decide to live with a 1000 ms latency here. I'm thinking of > putting a cache in the local filesystem in front of Cassandra (or whichever > distributed DB we decide to go with), so living with it is definitely > possible. > > Tim Freeman > Email: [email protected] > Desk in Palo Alto: (650) 857-2581 > Home: (408) 774-1298 > Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and > Thursday; call my desk instead.) > > -----Original Message----- > From: Jonathan Ellis [mailto:[email protected]] > Sent: Tuesday, December 01, 2009 11:10 AM > To: [email protected] > Subject: Re: Persistently increasing read latency > > 1) use jconsole to see what is happening to jvm / cassandra internals. > possibly you are slowly exceeding cassandra's ability to keep up with > writes, causing the jvm to spend more and more effort GCing to find > enough memory to keep going > > 2) you should be at least on 0.4.2 and preferably trunk if you are > stress testing > > -Jonathan > > On Tue, Dec 1, 2009 at 12:11 PM, Freeman, Tim <[email protected]> wrote: >> In an 8 hour test run, I've seen the read latency for Cassandra drift fairly >> linearly from ~460ms to ~900ms. Eventually my application gets starved for >> reads and starts misbehaving. I have attached graphs -- horizontal scales >> are seconds, vertical scales are operations per minute and average >> milliseconds per operation. The clearest feature is the light blue line in >> the left graph drifting consistently upward during the run. >> >> I have a Cassandra 0.4.1 database, one node, records are 100kbytes each, >> 350K records, 8 threads reading, around 700 reads per minute. There are >> also 8 threads writing. This is all happening on a 4 core processor that's >> supporting both the Cassandra node and the code that's generating load for >> it. I'm reasonably sure that there are no page faults. >> >> I have attached my storage-conf.xml. Briefly, it has default values, except >> RpcTimeoutInMillis is 30000 and the partitioner is >> OrderPreservingPartitioner. Cassandra's garbage collection parameters are: >> >> -Xms128m -Xmx1G -XX:SurvivorRatio=8 -XX:+AggressiveOpts -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >> >> Is this normal behavior? Is there some change to the configuration I should >> make to get it to stop getting slower? If it's not normal, what debugging >> information should I gather? Should I give up on Cassandra 0.4.1 and move >> to a newer version? >> >> I'll leave it running for the time being in case there's something useful to >> extract from it. >> >> Tim Freeman >> Email: [email protected] >> Desk in Palo Alto: (650) 857-2581 >> Home: (408) 774-1298 >> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and >> Thursday; call my desk instead.) >> >> >
