[ 
https://issues.apache.org/jira/browse/CASSANDRA-20014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897303#comment-17897303
 ] 

Matt Byrd commented on CASSANDRA-20014:
---------------------------------------

I'm wondering with this latest problem you're proposing, what the timestamps of 
the write and tombstone would be?
Assuming this is all server side timestamps: If the clocks are out of sync so 
that the local deletion time is earlier than the initial mutation timestamp,
then the delete would presumably also have a lower cell timestamp and hence 
just immediately not be visible as the write would be seen as latest due to LWW?
I think that's also what you're alluding to with "writes occurring out of order 
kinda", please correct me if I've misinterpreted.

Or is this in a scenario where the client is supplying their own timestamps to 
try and ensure the delete is visible in a LWW sense, i.e they read the current 
write timestamp and increment it or supply both? (but not supplying a custom 
local deletion time)
I suppose that then comes back to the problem of the local deletion time being 
independent of the cell timestamps.
I'm not sure if it's a good idea if someone was setting custom timestamps, to 
then also set custom local deletion time via the now_in_seconds option, 
but it does seem that doing so would also avoid the problem (assuming the patch 
was present).
I.e the local deletion on the later write would be shifted forward so the 
delete would last till hint delivery.  

Apart from that, I'm not sure there is an easy way to stop the problem if 
timestamps and deletion times can be set arbitrarily differently from each 
other, either via clock skew or users.
I did previously think about maybe adding just a configurable buffer to hint 
TTL, which could shorten the liveness by say a default of 1 minute.
This could also I suppose protect in the case that deletion time differs from 
timestamps due to some ideally bounded clock skew.
The tricky bit there is perhaps the trade-off between expiring hints more 
aggressively (maybe a problem if folks set a very short hint ttl for some 
reason?) and then having it be a large enough buffer to prevent the problem, 
maybe 10s or a minute or something could suffice?
It does add a fair bit of complexity/configuration for what is in all 
likelihood an even more obscure problem 
(given the need for custom timestamps, long delayed hint delivery and precisely 
timed compaction).

WDYT [~bdeggleston] [~clohfink]?     

Also thanks for the feedback on the PR, I've pushed some more changes which 
hopefully address them all.

> Discard hints based on write time, not timeout time
> ---------------------------------------------------
>
>                 Key: CASSANDRA-20014
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20014
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Hints
>            Reporter: Blake Eggleston
>            Assignee: Matt Byrd
>            Priority: Normal
>
> Hints are created after a write timeout are created with the timeout time as 
> the hint creation time. In the case of slow hint delivery, this can create a 
> window of time where a write is applied after gcgs would have elapsed for 
> tombstones written after the original write, and the tombstone has been 
> purged, causing data resurrection. We should use the time the client request 
> thread started working on the request as the hint creation time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to