On 2017-07-10 02:07 (-0700), 张强 <tzx...@gmail.com> wrote: 
> Hi experts, I've a single cassandra 3.11.0 node working with kairosdb (a
> time series database), after running 4 days with stable workload, the
> database client start to get "request errors", but there are not a lot of
> error or warning messages in the cassandra log file, the client start to
> receive error message at about 7-7 21:03:00, and kairosdb keep retrying
> after that time, but there isn't much logs in the cassandra log file.
> I've noticed the abnormal status at about 7-8 16:00:00, then I've typed a
> "nodetool tablestats" command to get some information, the command got an
> error, and while that time, the cassandra process start to crash, and
> generated a dump file.
> After C* shutdown, I take the logs to see what happened, and I found
> something strange inside the logs.
> 
> 1. In the system.log, there are two lines shows that no logs between
> 2017-07-07 21:03:50 to 2017-07-08 16:07:33, I think that is a pretty long
> period without any logs, and in gc.log file, there are a lot of logs shows
> long time GC, that should be logged in system.log.
> INFO  [ReadStage-1] 2017-07-07 21:03:50,824 NoSpamLogger.java:91 - Maximum
> memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB

Failing to allocate during read stage is a good indication that you're out of 
memory - either the heap is too small, or it's a direct memory allocation 
failure, or something, but that log line probably shouldn't be at INFO, because 
it seems like it's probably hiding a larger problem. 

> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2017-07-08 16:07:33,347
> NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with
> average duration of 60367.73ms, 1 have exceeded the configured commit
> interval by an average of 50367.73ms

It's taking a full minute to sync your memtable to disk. This is either 
indication that your disk is broken, or your JVM is pausing for GC. 

> 
> 2. In the system.log, there is a log shows very long time GC, and then the
> C* start to close.
> WARN  [ScheduledTasks:1] 2017-07-08 16:07:46,846 NoSpamLogger.java:94 -
> Some operations timed out, details available at debug level (debug.log)
> WARN  [Service Thread] 2017-07-08 16:10:36,114 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 688850ms.  CMS Old Gen: 2114938312 -> 469583832;
> Par Eden Space: 837584 -> 305319752; Par Survivor Space: 41943040 ->
> 25784008
> ......
> ERROR [Thrift:22] 2017-07-08 16:10:56,322 CassandraDaemon.java:228 -
> Exception in thread Thread[Thrift:22,5,main]
> java.lang.OutOfMemoryError: Java heap space

You ran out of heap. We try to clean up and kill things when this happens, but 
by definition, the JVM is in an undefined state, and we may not be able to shut 
things down properly. 

> 
> 3. In the debug.log, the last INFO level log is at 2017-07-07 14:43:59, the
> log is:
> INFO  [IndexSummaryManager:1] 2017-07-07 14:43:59,967
> IndexSummaryRedistribution.java:75 - Redistributing index summaries
> After that, there are DEBUG level logs until 2017-07-07 21:11:34, but no
> more INFO level or other level logs in that log file, while there are still
> many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these
> two log files match?
> 
> My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012
> r2.

That's a bit thin - depending on data model and data volume, you may be able to 
construct a read that fills up your 3G heap, and causes you to OOM with a 
single read. How much data is involved?  What does 'nodetool tablestats' look 
like, and finally, how many reads/seconds are you doing on this workload?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to