Re: Understanding when Cassandra drops expired time series data

Jeff Jirsa Fri, 17 Jun 2016 13:04:13 -0700

getFullyExpiredSSTables behavior and read repair mixing data apply to TWCS as 
well, though TWCS does a major compaction per window to try to decrease the 
interwoven graph that can happen in DTCS windows with sstable expired blockers  
(and the way DTCS windowing works, using minTimestamp to choose a bucket 
instead of maxTimestamp likely makes it “worse” in the sense that you’ll have 
more new data in old windows if you use DTCS than if you use TWCS).


 

Something like https://issues.apache.org/jira/browse/CASSANDRA-10496 may be the 
right fix eventually. 

 

(Alternatively, if https://issues.apache.org/jira/browse/CASSANDRA-9779 gets 
implemented, ‘APPEND ONLY’ tables could be more aggressive in 
getFullyExpiredSSTables)

 

 

From: "Jason J. W. Williams" <jasonjwwilli...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, June 17, 2016 at 12:29 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Understanding when Cassandra drops expired time series data

 

Hey Jeff, 

 

Do most of those behaviors apply to TWCS too?

 

-J

 

On Fri, Jun 17, 2016 at 1:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

First, DTCS in 2.0.15 has some weird behaviors - 
https://issues.apache.org/jira/browse/CASSANDRA-9572 . 

 

That said, some other general notes:


Data deleted by TTL isn’t the same as issuing a delete – each expiring cell 
internally has a ttl/timestamp at which it will be converted into a tombstone. 
There is no tombstone added to the memtable, or flushed to disk – it just 
treats the expired cells as tombstones once they’re past that timestamp. 

                                                                                
                                                                                
                                                                            

Cassandra’s getFullyExpiredSSTables() will consider a table fully expired if 
(and only if) all cells within that table are expired (current time > max 
timestamp ) AND the sstable timestamps don’t overlap with others that aren’t 
fully expired. Björn talks about this in 
https://issues.apache.org/jira/browse/CASSANDRA-8243 - the intent here is so 
that explicit deletes (which do create tombstones) won’t be GC’d  from an 
otherwise fully expired sstable if they’re covering data in a more recent 
sstable – without this check, we could accidentally bring dead data back to 
life. In an append only time series workload this would be unusual, but not 
impossible. 

Unfortunately, read repairs (foreground/blocking, if you write with CL < ALL 
and read with CL > ONE) will cause cells written with old timestamps to be 
written into the newly flushed sstables, which creates sstables with wide gaps 
between minTimestamp and maxTimestamp (you could have a read repair pull data 
that is 23 hours old into a new sstable, and now that one sstable spans 23 
hours, and isn’t fully expired until the oldest data is 47 hours old). There’s 
an open ticket (https://issues.apache.org/jira/browse/CASSANDRA-10496 ) meant 
to make this behavior ‘better’ in the future by splitting those old 
read-repaired cells from the newly flushed sstables. 

                                                                                
                                                                                
                                                                            

I gave a talk on a lot of this behavior last year at Summit 
(http://www.slideshare.net/JeffJirsa1/cassandra-summit-2015-real-world-dtcs-for-operators
 ) - if you’re running time series in production on DTCS, it’s worth a glance.

 

 

 

From: jerome <jeromefroel...@hotmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, June 17, 2016 at 11:52 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Understanding when Cassandra drops expired time series data

 

Hello! Recently I have been trying to familiarize myself with Cassandra but 
don't quite understand when data is removed from disk after it has been 
deleted. The use case I'm particularly interested is expiring time series data 
with DTCS. As an example, I created the following table:
CREATE TABLE metrics (
  metric_id text,
  time timestamp,
  value double,
  PRIMARY KEY (metric_id, time),
) WITH CLUSTERING ORDER BY (time DESC) AND 
     default_time_to_live = 86400 AND
     gc_grace_seconds = 3600 AND
     compaction = {
      'class': 'DateTieredCompactionStrategy',
      'timestamp_resolution':'MICROSECONDS',
      'base_time_seconds':'3600',
      'max_sstable_age_days':'365',
      'min_threshold':'4'
     };
I understand that Cassandra will create a tombstone for all rows inserted into 
this table 24 hours after they are inserted (86400 seconds). These tombstones 
will first be written to an in-memory Memtable and then flushed to disk as an 
SSTable when the Memtable reaches a certain size. My question is when will the 
data that is now expired be removed from disk? Is it the next time the SSTable 
which contains the data gets compacted? So, with DTCS and min_threshold set to 
four, we would wait until at least three other SSTables are in the same time 
window as the expired data, and then those SSTables will be compacted into a 
SSTable without the expired data. Is it only during this compaction that the 
data will be removed? It seems to me that this would require Cassandra to 
maintain some metadata on which rows have been deleted since the newer 
tombstones would likely not be in the older SSTables that are being compacted. 
Also, I'm aware that Cassandra can drop entire SSTables if they contain only 
expired data but I'm unsure of what qualifies as expired data (is it just 
SSTables whose maximum timestamp is past the default TTL for the table?) and 
when such SSTables are dropped.

Alternatively, do the SSTables which contain the tombstones have to be 
compacted with the SSTables which contain the expired data for the data to be 
removed? It seems to me that this could result in Cassandra holding the expired 
data long after it has expired since it's waiting for the new tombstones to be 
compacted with the older expired data.

Finally, I was also unsure when the tombstones themselves are removed. I know 
Cassandra does not delete them until after gc_grace_seconds but it can't delete 
the tombstones until it's sure the expired data has been deleted right? 
Otherwise it would see the expired data as being valid. Consequently, it seems 
to me that the question of when tombstones are deleted is intimately tied to 
the questions above. 

Thanks in advance! If it helps I've been experimenting with version 2.0.15 
myself.

smime.p7s
Description: S/MIME cryptographic signature

Re: Understanding when Cassandra drops expired time series data

Reply via email to