Any errors in the log ?

The node recovers ? 

Do you use secondary indexes ? If so check comments for  
memtable_flush_queue_size in the yaml. if this value is too low writes may back 
up. But I would not expect it to cause dropped messages. 

> nodetool info also shows we have over a gig of available memory on the JVM 
> heap of each node.

Not all memory is created equal :)
ParNew is kicking in to GC the Eden space in the New Heap. 
 
It may just be that the node is getting hammered by something and IO is getting 
overwhelmed. If you can put the logs up someone might take a look. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 3:46 PM, Michael Theroux <mthero...@yahoo.com> wrote:

> Thanks for the response.
> 
> We are on version 1.1.2.  We don't see the MutationStage back up.  The dump 
> from the messages dropped error doesn't show a backup, but also watching 
> "nodetool tpstats" doesn't show any backup there.
> 
> nodetool info also shows we have over a gig of available memory on the JVM 
> heap of each node.
> 
> The earliest GCInspector traces I see before one of the more recent incidents 
> in which messages were dropped are:
> 
>       INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 
> 122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 
> 4253024256
>  
>       NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 
> 122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is 
> 4253024256
>  
>       INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line 
> 122) GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is 
> 4253024256
> 
> But this was 45 minutes before messages were dropped.
> 
> It's appreciated,
> -Mike
>  
> On Sep 17, 2012, at 11:27 PM, aaron morton wrote:
> 
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>>> MemtablePostFlusher               1         5         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>>> FlushWriter                       1         5         0
>> Looks suspiciously like 
>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E
>> 
>> What version are you on ? 
>> 
>> Are there any ERROR log messages before this ? 
>> 
>> Are you seeing MutationStage back up ? 
>> 
>> Are you see log messages from GCInspector ?
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/09/2012, at 2:16 AM, Michael Theroux <mthero...@yahoo.com> wrote:
>> 
>>> Hello,
>>> 
>>> While under load, we have occasionally been seeing "messages dropped" 
>>> errors in our cassandra log.  Doing some research, I understand this is 
>>> part of Cassandra's design to shed load, and we should look at the 
>>> tpstats-like output to determine what should be done to resolve the 
>>> situation.  Typically, you will see lots of messages blocked or pending, 
>>> and that might be an indicator that a specific part of hardware needs to be 
>>> improved/tuned/upgraded.  
>>> 
>>> However, looking at the output we are getting, I'm finding it difficult to 
>>> see what needs to be tuned, as it looks to me cassandra is handling the 
>>> load within the mutation stage:
>>> 
>>> NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 
>>> 658) 119 MUTATION messages dropped in last 5000ms
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) 
>>> Pool Name                    Active   Pending   Blocked
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) 
>>> ReadStage                         3         3         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
>>> RequestResponseStage              0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
>>> ReadRepairStage                   0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
>>> MutationStage                     0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
>>> ReplicateOnWriteStage             0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
>>> GossipStage                       0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>>> AntiEntropyStage                  0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>>> MigrationStage                    0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>>> StreamStage                       0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>>> MemtablePostFlusher               1         5         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>>> FlushWriter                       1         5         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>>> MiscStage                         0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>>> commitlog_archiver                0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
>>> InternalResponseStage             0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
>>> AntiEntropySessions               0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) 
>>> HintedHandoff                     0         0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) 
>>> CompactionManager                 0         0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) 
>>> MessagingService                n/a       0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) 
>>> Cache Type                     Size                 Capacity               
>>> KeysToSave                                                         Provider
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 
>>> 100) KeyCache                    2184533                  2184533           
>>>            all                                                              
>>>    
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 
>>> 106) RowCache                          0                        0           
>>>            all              
>>> org.apache.cassandra.cache.SerializingCacheProvider
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 
>>> 113) ColumnFamily                Memtable ops,data
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 
>>> 116) system.NodeIdInfo                         0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 
>>> 116) system.IndexInfo                          0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 
>>> 116) system.LocationInfo                       0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 
>>> 116) system.Versions                           0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 
>>> 116) system.schema_keyspaces                   0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 
>>> 116) system.Migrations                         0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 
>>> 116) system.schema_columnfamilies                 0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 
>>> 116) system.schema_columns                     0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 
>>> 116) system.HintsColumnFamily                  0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 
>>> 116) system.Schema                             0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 
>>> 116) prod.comp                           0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 
>>> 116) prod.byp                      224,61863
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 
>>> 116) prod.byn                      30,6106
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 
>>> 116) prod.p                     3354,3792954
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line 
>>> 116) prod.rel                788,153294
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line 
>>> 116) prod.i                        46,271517
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 
>>> 116) prod.users                     18489,5650405
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 
>>> 116) prod.caches                       40,7512998
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 
>>> 116) prod.sessions                  8085,13063121
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 
>>> 116) prod.content                             0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line 
>>> 116) prod.enr                  436,341190
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line 
>>> 116) prod.byt                 23800,9880339
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 
>>> 116) prod.aliases                  63603,10475769
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 
>>> 116) prod.ran                    1500,329387
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 
>>> 116) prod.cmap                        0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 
>>> 116) prod.pmap                73032,61237450
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line 
>>> 116) prod.pic                   5421,4438066
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line 
>>> 116) prod.watchdog                            0,0
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line 
>>> 116) prod.s                         31,40924
>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line 
>>> 116) prod.m                 21690,64262284
>>> 
>>> 
>>> We are using a write consistency level of LOCAL_QUROUM over a six node 
>>> cluster.  Any observations are welcome,
>>> 
>>> Thanks,
>>> -Mike
>> 
> 

Reply via email to