[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960663#comment-14960663
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 1:16 PM:
---------------------------------------------------------------------

yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and 
that is NOT the case on the other nodes. yes, they are balanced in terms of 
data (40 core machines with lots of memory). in this stage of our rollout to 
2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only 
THREE nodes of the thirty are now exhibiting this behavior. for the first few 
days, several others did however they seem to have self corrected. these three 
still have not. I will go back and check for large sstable counts to see if 
that explains all of them. after this first stage, we'll be rolling out to the 
larger 24-node clusters but we are pausing here on the small clusters until we 
figure this out.



was (Author: jeffery.griffith):
yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and 
that is NOT the case on the other nodes. yes, they are balanced in terms of 
data (40 core machines with lots of memory). in this stage of our rollout to 
2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only 
THREE nodes of the thirty are now exhibiting this behavior. for the first few 
days, several others did however they seem to have self corrected. I will go 
back and check for large sstable counts to see if that explains all of them. 
after this first stage, we'll be rolling out to the larger 24-node clusters but 
we are pausing here on the small clusters until we figure this out.


> Commit logs back up with move to 2.1.10
> ---------------------------------------
>
>                 Key: CASSANDRA-10515
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: redhat 6.5, cassandra 2.1.10
>            Reporter: Jeff Griffith
>            Assignee: Branimir Lambov
>            Priority: Critical
>              Labels: commitlog, triage
>         Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, 
> RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>    compaction type   keyspace                          table     completed    
>       total    unit   progress
>         Compaction   SyncCore                          *cf1*   61251208033   
> 170643574558   bytes     35.89%
>         Compaction   SyncCore                          *cf2*   19262483904    
> 19266079916   bytes     99.98%
>         Compaction   SyncCore                          *cf3*    6592197093    
>  6592316682   bytes    100.00%
>         Compaction   SyncCore                          *cf4*    3411039555    
>  3411039557   bytes    100.00%
>         Compaction   SyncCore                          *cf5*    2879241009    
>  2879487621   bytes     99.99%
>         Compaction   SyncCore                          *cf6*   21252493623    
> 21252635196   bytes    100.00%
>         Compaction   SyncCore                          *cf7*   81009853587    
> 81009854438   bytes    100.00%
>         Compaction   SyncCore                          *cf8*    3005734580    
>  3005768582   bytes    100.00%
> Active compaction remaining time :        n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to