[jira] [Created] (CASSANDRA-10138) Millions of compaction tasks on empty DB

A Markov (JIRA) Thu, 20 Aug 2015 05:35:24 -0700

A Markov created CASSANDRA-10138:
------------------------------------

             Summary: Millions of compaction tasks on empty DB
                 Key: CASSANDRA-10138
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10138
             Project: Cassandra
          Issue Type: Bug
         Environment: CentOS 6.5 and Cassandra 2.1.8
            Reporter: A Markov



Fresh installation of 2.1.8 Cassandra with no data in the database except 
systems tables becomes unresponsive after about 5-10 minutes from the start.

Initially problem was discovered on empty cluster of 12 nodes because of the 
creation schema error - script was exiting by timeout giving an error. 

Analysis of log files showed that nodes were constantly reported as DOWN and 
then after some period of time UP. That was reported for multiple nodes.

Verification of the system.log file showed that nodes constantly perform GC and 
while doing that all cores of the system were 100% busy which caused node 
disconnect after some time.

Further analysis with nodetool (tpstats option) showed us that just after 10 
minutes since clean node restart node completed more then 47M compaction tasks 
and had more then 12M pending. Here is example of the output:

nodetool tpstats

Pool Name                    Active   Pending      Completed   Blocked  All 
time blocked
CounterMutationStage              0         0              0         0          
       0
ReadStage                         0         0              0         0          
       0
RequestResponseStage              0         0              0         0          
       0
MutationStage                     0         0            257         0          
       0
ReadRepairStage                   0         0              0         0          
       0
GossipStage                       0         0              0         0          
       0
CacheCleanupExecutor              0         0              0         0          
       0
MigrationStage                    0         0              0         0          
       0
ValidationExecutor                0         0              0         0          
       0
Sampler                           0         0              0         0          
       0
MemtableReclaimMemory             0         0              8         0          
       0
InternalResponseStage             0         0              0         0          
       0
AntiEntropyStage                  0         0              0         0          
       0
MiscStage                         0         0              0         0          
       0
CommitLogArchiver                 0         0              0         0          
       0
MemtableFlushWriter               0         0              8         0          
       0
PendingRangeCalculator            0         0              1         0          
       0
MemtablePostFlush                 0         0             44         0          
       0
CompactionExecutor                0  12996398       47578625         0          
       0
AntiEntropySessions               0         0              0         0          
       0
HintedHandoff                     0         1              2         0          
       0

I am repeating myself but that was on TOTALLY EMPTY DB after 10 minutes since 
cassandra was started.

I was able to repeateadly reproduce same issue and behaviour with single 
cassandra instance. Issue was persistent after I did full cassandra wipe out 
and reinstall from repository.

I discovered that issue dissipaters if I execute
nodetool disableautocompaction

in that case system quickly (in a matter of 20-30 seconds) goes though all 
pending tasks and becomes idle. If I enable autocompaction again in about 1 
minute it jumps to millions of pending tasks again.

I verified it on the save server with version of Cassandra 2.1.6 and issue was 
not present.

logs file do not show any ERROR messages. There were only warnings about GC 
events that were taking too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10138) Millions of compaction tasks on empty DB

Reply via email to