WeiFan created CASSANDRA-9661:
---------------------------------
Summary: Endless compaction to a tiny, tombstoned SStable
Key: CASSANDRA-9661
URL: https://issues.apache.org/jira/browse/CASSANDRA-9661
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: WeiFan
We deployed a 3-nodes cluster (with 2.1.5) which worked under stable write
requests ( about 2k wps) to a CF with DTCS, a default TTL as 43200s and
gc_grace as 21600s. The CF contained inserted only, complete time series data.
We found cassandra will occasionally keep writing logs like this:
INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,195
CompactionTask.java:270 - Compacted 1 sstables to
[/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270,].
449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total
partitions merged to 4. Partition merge counts were {1:4, }
INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,241
CompactionTask.java:140 - Compacting
[SSTableReader(path='/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270-Data.db')]
INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,253
CompactionTask.java:270 - Compacted 1 sstables to
[/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516271,].
449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total
partitions merged to 4. Partition merge counts were {1:4, }
It seems that cassandra kept doing compacting to a single SStable, serveral
times per second, and lasted for many hours. Tons of logs were thrown and one
CPU core exhausted during this time. The endless compacting finally end when
another compaction started with a group of SStables (including previous one).
All of our 3 nodes have been hit by this problem, but occurred in different
time.
We could not figure out how the problematic SStable come up because the log has
wrapped around.
We have dumped the records in the SStable and found it has the oldest data in
our CF (again, our data was time series), and all of the record in this SStable
have bben expired for more than 18 hours (12 hrs TTL + 6 hrs gc) so they should
be dropped. However, c* do nothing to this SStable but compacting it again and
again, until more SStable were out-dated enough to be considered for compacting
together with this one by DTCS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)