WeiFan created CASSANDRA-9661:
---------------------------------

             Summary: Endless compaction to a tiny, tombstoned SStable
                 Key: CASSANDRA-9661
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9661
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: WeiFan


We deployed a 3-nodes cluster (with 2.1.5) which worked under stable write 
requests ( about 2k wps) to a CF with DTCS, a default TTL as 43200s and 
gc_grace as 21600s. The CF contained inserted only, complete time series data. 
We found cassandra will occasionally keep writing logs like this:

INFO  [CompactionExecutor:30551] 2015-06-26 18:10:06,195 
CompactionTask.java:270 - Compacted 1 sstables to 
[/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270,].
  449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s.  4 total 
partitions merged to 4.  Partition merge counts were {1:4, }
INFO  [CompactionExecutor:30551] 2015-06-26 18:10:06,241 
CompactionTask.java:140 - Compacting 
[SSTableReader(path='/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270-Data.db')]
INFO  [CompactionExecutor:30551] 2015-06-26 18:10:06,253 
CompactionTask.java:270 - Compacted 1 sstables to 
[/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516271,].
  449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s.  4 total 
partitions merged to 4.  Partition merge counts were {1:4, }

It seems that cassandra kept doing compacting to a single SStable, serveral 
times per second, and lasted for many hours. Tons of logs were thrown and one 
CPU core exhausted during this time. The endless compacting finally end when 
another compaction started with a group of SStables (including previous one). 
All of our 3 nodes have been hit by this problem, but occurred in different 
time.

We could not figure out how the problematic SStable come up because the log has 
wrapped around. 

We have dumped the records in the SStable and found it has the oldest data in 
our CF (again, our data was time series), and all of the record in this SStable 
have bben expired for more than 18 hours (12 hrs TTL + 6 hrs gc) so they should 
be dropped. However, c* do nothing to this SStable but compacting it again and 
again, until more SStable were out-dated enough to be considered for compacting 
together with this one by DTCS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to