[jira] [Commented] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions

Etienne Adam (JIRA) Thu, 14 May 2015 20:52:39 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544872#comment-14544872
 ]


Etienne Adam commented on CASSANDRA-8340:
-----------------------------------------

All timestamps are inserted with microseconds (but not with USING TIMESTAMP, we 
let cassandra set the default current timestamp), we never used milliseconds.

Metadata from the big sstable:
Minimum timestamp: 1418249984079000 (Wed Dec 10 23:19:44 CET 2014
Maximum timestamp: 1431479047084000 (Wed May 13 03:04:07 CEST 2015
Metadata from 2 of smaller, recent sstables:
Minimum timestamp: 1431487800237000 (Wed May 13 05:30:00 CEST 2015)
Maximum timestamp: 1431615168636000 (Thu May 14 16:52:48 CEST 2015)
Minimum timestamp: 1427621080007000 (Sun Mar 29 11:24:40 CEST 2015)
Maximum timestamp: 1431616482164000 (Thu May 14 17:14:42 CEST 2015)

I do not understand how the timestamp 1427621080007000 (March 29) could be 
flushed just now.. We never force timestamp and all nodes are synced with ntp. 
The only difference in the logs are "RMI TCP Connection(42243)-192.168.96.31" 
instead of the usual "NativePoolCleaner" when flushing:

INFO  [RMI TCP Connection(42243)-192.168.96.x] 2015-05-14 17:14:42,320 
ColumnFamilyStore.java:877 - Enqueuing flush of xxx: 14940262 (1%) on-heap, 
18910037 (1%) off-heap
INFO  [MemtableFlushWriter:14583] 2015-05-14 17:14:42,803 Memtable.java:378 - 
Completed flushing 
/var/lib/cassandra/data/xxx/xxx-b7d729907fbf11e4ab6615203fafe427/xxx-xxx-ka-101593-Data.db
 (3640771 bytes) for commitlog position ReplayPosition(segmentId=1430810039198, 
position=18118938)

We have a flush rate at about 1 sstable every 2 hours, which does not sound 
elevated given the insertion rate.

> Use sstable min timestamp when deciding if an sstable should be included in 
> DTCS compactions
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8340
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Minor
>
> Currently we check how old the newest data (max timestamp) in an sstable is 
> when we check if it should be compacted.
> If we instead switch to using min timestamp for this we have a pretty clean 
> migration path from STCS/LCS to DTCS. 
> My thinking is that before migrating, the user does a major compaction, which 
> creates a huge sstable containing all data, with min timestamp very far back 
> in time, then switching to DTCS, we will have a big sstable that we never 
> compact (ie, min timestamp of this big sstable is before 
> max_sstable_age_days), and all newer data will be after that, and that new 
> data will be properly compacted
> WDYT [~Bj0rn] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions

Reply via email to