[
https://issues.apache.org/jira/browse/CASSANDRA-20014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897303#comment-17897303
]
Matt Byrd commented on CASSANDRA-20014:
---------------------------------------
I'm wondering with this latest problem you're proposing, what the timestamps of
the write and tombstone would be?
Assuming this is all server side timestamps: If the clocks are out of sync so
that the local deletion time is earlier than the initial mutation timestamp,
then the delete would presumably also have a lower cell timestamp and hence
just immediately not be visible as the write would be seen as latest due to LWW?
I think that's also what you're alluding to with "writes occurring out of order
kinda", please correct me if I've misinterpreted.
Or is this in a scenario where the client is supplying their own timestamps to
try and ensure the delete is visible in a LWW sense, i.e they read the current
write timestamp and increment it or supply both? (but not supplying a custom
local deletion time)
I suppose that then comes back to the problem of the local deletion time being
independent of the cell timestamps.
I'm not sure if it's a good idea if someone was setting custom timestamps, to
then also set custom local deletion time via the now_in_seconds option,
but it does seem that doing so would also avoid the problem (assuming the patch
was present).
I.e the local deletion on the later write would be shifted forward so the
delete would last till hint delivery.
Apart from that, I'm not sure there is an easy way to stop the problem if
timestamps and deletion times can be set arbitrarily differently from each
other, either via clock skew or users.
I did previously think about maybe adding just a configurable buffer to hint
TTL, which could shorten the liveness by say a default of 1 minute.
This could also I suppose protect in the case that deletion time differs from
timestamps due to some ideally bounded clock skew.
The tricky bit there is perhaps the trade-off between expiring hints more
aggressively (maybe a problem if folks set a very short hint ttl for some
reason?) and then having it be a large enough buffer to prevent the problem,
maybe 10s or a minute or something could suffice?
It does add a fair bit of complexity/configuration for what is in all
likelihood an even more obscure problem
(given the need for custom timestamps, long delayed hint delivery and precisely
timed compaction).
WDYT [~bdeggleston] [~clohfink]?
Also thanks for the feedback on the PR, I've pushed some more changes which
hopefully address them all.
> Discard hints based on write time, not timeout time
> ---------------------------------------------------
>
> Key: CASSANDRA-20014
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20014
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Hints
> Reporter: Blake Eggleston
> Assignee: Matt Byrd
> Priority: Normal
>
> Hints are created after a write timeout are created with the timeout time as
> the hint creation time. In the case of slow hint delivery, this can create a
> window of time where a write is applied after gcgs would have elapsed for
> tombstones written after the original write, and the tombstone has been
> purged, causing data resurrection. We should use the time the client request
> thread started working on the request as the hint creation time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]