Re: Coordination of expired TTLs compared to tombstones

2015-05-29 Thread Tyler Hobbs
On Fri, May 29, 2015 at 1:31 PM, Robert Wille rwi...@fold3.com wrote:


 I was wondering how that compares to cells with expired TTLs. Does the
 node get to skip sending data back to the coordinator for an expired TTL?


No, it has to send expired cells.



 Suppose you wrote a cell with no TTL, and then updated it with a TTL.
 Suppose that node 1 got both writes, but node 2 only got the first one. If
 you asked for the cell after it expired, and node 1 did not send anything
 to the coordinator, it seems to me that that could violate consistency
 levels. Also, read repair could never fix node 2. So, how does that work?


That's precisely why they have to be sent to the coordinator.



 On a related note, do cells with expired TTLs have to wait
 gc_grace_seconds before they can be compacted out?


Yes.


 It seems to me that if they could get compacted out immediately after
 expiration, you could get zombie data, just like you can with tombstones.
 For example, write a cell with no TTL to all replicas, shut down one
 replica, update the cell with a TTL, compact after the TTL has expired,
 then bring the other node back up. Voila, the formerly down node has a
 value that will replicate to the other nodes.


Correct, that's why they can't be purged immediately.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Coordination of expired TTLs compared to tombstones

2015-05-29 Thread Robert Wille
I was wondering something about Cassandra’s internals.

Suppose I have CL  1 and I read a partition with a bunch of tombstones. Those 
tombstones have to be sent to the coordinator for consistency reasons so that 
if another replica produces non-tombstone data that is older than the 
tombstone, it can know that the data has been deleted.

I was wondering how that compares to cells with expired TTLs. Does the node get 
to skip sending data back to the coordinator for an expired TTL? I am under the 
impression that expired data doesn’t have to be sent to the coordinator, but as 
I think about it, it seems like that might not be true. 

Suppose you wrote a cell with no TTL, and then updated it with a TTL. Suppose 
that node 1 got both writes, but node 2 only got the first one. If you asked 
for the cell after it expired, and node 1 did not send anything to the 
coordinator, it seems to me that that could violate consistency levels. Also, 
read repair could never fix node 2. So, how does that work?

On a related note, do cells with expired TTLs have to wait gc_grace_seconds 
before they can be compacted out? It seems to me that if they could get 
compacted out immediately after expiration, you could get zombie data, just 
like you can with tombstones. For example, write a cell with no TTL to all 
replicas, shut down one replica, update the cell with a TTL, compact after the 
TTL has expired, then bring the other node back up. Voila, the formerly down 
node has a value that will replicate to the other nodes.

Thanks in advance

Robert