Re: Timeout Exception

Chris Were Mon, 16 Nov 2009 18:13:35 -0800

Reading more on JVM GC led me to investigate the java -server flag (
http://stackoverflow.com/questions/198577/real-differences-between-java-server-and-java-client
)


>From what I can see cassandra's startup scripts don't invoke this mode, or
did I miss it?

Chris.

On Mon, Nov 16, 2009 at 10:33 AM, Freeman, Tim <[email protected]> wrote:

>  You'll have to stop the swapping somehow.  Maybe you can install more
> memory, maybe you can run Cassandra smaller, maybe you can get some other
> process on the machine to be smaller or on some other machine, maybe you can
> move Cassandra to some other machine with more available physical memory.
>
>
>
> I don't have experience with running Cassandra smaller than the recommended
> size, so one of those options might not work.
>
>
>
> Caching database information in swapped-out pages usually isn't a win.  To
> a first approximation, you need an I/O to fetch the swapped-out page, but
> you'd need an I/O anyway to get the information from the database.  Swapping
> on modern machines usually isn't a win in general -- Memory got bigger and
> CPU's got faster in the last decade, but disks didn't get much faster.
>
>
>
> Tim Freeman
> Email: [email protected]
> Desk in Palo Alto: (650) 857-2581
> Home: (408) 774-1298
> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
> Thursday; call my desk instead.)
>
>
>
> *From:* Chris Were [mailto:[email protected]]
> *Sent:* Monday, November 16, 2009 10:13 AM
> *To:* [email protected]
> *Subject:* Re: Timeout Exception
>
>
>
> Hi Tim,
>
>
>
> Thanks for the great pointers.
>
>
>
> si, so are regularly in the 100-2000 range. I'll need to Google more about
> what these mean etc, but are you effectively saying to tell cassandra to use
> less memory? Cassandra is the only Java App running on the server.
>
>
>
> Cheers,
>
> Chris
>
> On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim <[email protected]> wrote:
>
> I'm running 0.4.1.  I used to get timeouts, then I changed my timeout from
> 5 seconds to 30 seconds and I get no more timeouts.  The relevant line from
> storage-conf.xml is:
>
>
>
>   <RpcTimeoutInMillis>30000</RpcTimeoutInMillis>
>
>
>
> The maximum latency is often just over 5 seconds in the worst case when I
> fetch thousands of records, so default timeout of 5 seconds happens to be a
> little bit too low for me.  My records are ~100Kbytes each.  You may get
> different results if your records are much larger or much smaller.
>
>
>
> The other issue I was having a few days ago was that the machine was page
> faulting so garbage collections were taking forever.  Some GC's took 20
> minutes in another Java process.  I didn't have verbose:gc turned on in
> Cassandra so I'm not sure what the score was there, but there's little
> reason to expect it to be qualitatively better, since it's pretty random
> which process gets some of its pages swapped out.  On a Linux machine, run
> "vmstat 5" when your machine is loaded and if you see numbers greater than 0
> in the "si" and "so" columns in rows after the first, tell one of your Java
> processes to take less memory.
>
>
>
> Tim Freeman
> Email: [email protected]
> Desk in Palo Alto: (650) 857-2581
> Home: (408) 774-1298
> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
> Thursday; call my desk instead.)
>
>
>
> *From:* Chris Were [mailto:[email protected]]
> *Sent:* Monday, November 16, 2009 9:47 AM
> *To:* Jonathan Ellis
> *Cc:* [email protected]
> *Subject:* Re: Timeout Exception
>
>
>
> I turned on debug logging for a few days and timeouts happened across
> pretty much all requests. I couldn't see any particular request that was
> consistently the problem.
>
>
>
> After some experimenting it seems that shutting down cassandra and
> restarting resolves the problem. Once it hits the JVM memory limit however,
> the timeouts start again. I have read the page on MemTable thresholds and
> have tried thresholds of 32MB, 64MB and 128MB with no noticeable difference.
> Cassandra is set to use 7GB of memory. I have 12 CF's, however only 6 of
> those have lots of data.
>
>
>
> Cheers,
>
> Chris
>
> On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis <[email protected]>
> wrote:
>
> if you're timing out doing a slice on 10 columns w/ 10% cpu used,
> something is broken
>
> is it consistent as to which keys this happens on?  try turning on
> debug logging and seeing where the latency is coming from.
>
>
> On Tue, Nov 10, 2009 at 1:53 PM, Chris Were <[email protected]> wrote:
> >
> > On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis <[email protected]>
> wrote:
> >>
> >> On Tue, Nov 10, 2009 at 1:49 PM, Chris Were <[email protected]>
> wrote:
> >> > Maybe... but it's not just multigets, it also happens when retreiving
> >> > one
> >> > row with get_slice.
> >>
> >> how many of the 3M columns are you trying to slice at once?
> >
> > Sorry, I must have mixed up the terminology.
> > There's ~3M keys, but less than 10 columns in each. The get_slice calls
> are
> > to retreive all the columns (10) for a given key.
>
>
>
>
>

Re: Timeout Exception

Reply via email to