[ 
https://issues.apache.org/jira/browse/CASSANDRA-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735069#comment-14735069
 ] 

Sylvain Lebresne commented on CASSANDRA-10261:
----------------------------------------------

That view tombstone idea sounds like it'd work. So let me suggest some ideas 
regarding how it could be implemented.

First, I'd prefer not calling them "view tombstones". It's imo better to call 
things by what they are rather than what they are for since you never know if 
you won't end up using it for something else someday. And what they do is that 
they are row tombstones that don't reach inside the row unless their timestamp 
is greater than the row one. The two suggestions I would have for a name would 
be either "weak" or "shallow" row-tombstones. I'll use "weak tombstone" in the 
rest but I'm not strongly attached to it if there is a better idea.

The second thing is that as far as I can tell, there is no reason to keep both 
a weak and a normal tombstone on a given row. A simple rule like "normal 
tombstones wins over weak if they have the same timestamp" is likely fine. So I 
would implement it more as 2 variations of row tombstones: weak and 
normal/strong. More concretely, we could simply add a "isDeletionWeak" boolean 
to each {{Row}} and update the reconciliation rules to take it into account. 
Shouldn't require a whole of code in fact.

We'd also have to modify the serialization format to handle that new 
information, and sadly we don't have any room for a new flag in the current 
per-row flag byte (see {{UnfilteredSerializer}}). Right off the bat, I see 2 
options:
# we change one of the existing flag (say {{HAS_COMPLEX_DELETION}}) to be an 
"extension" flag. If that flag is set, then we read an addition byte of flags, 
and we use the additional room for this. If that flag isn't set, we assume some 
meaningful default for the flags in the "extension".  The advantage is that 
it's a somewhat general solution that could come in handy if we need more flags 
later. The slight disavantage being that we'd use an additional byte in some 
case where we currently do not, though that should hopefully be uncommon. 
# we don't use the flags at all for this: for instance, we could write 
{{-localDeletionTime}} for the row deletion if that's a "weak" deletion (after 
all, that should never be negative for a normal tombstone). It's a bit of a 
hack however, and it doesn't play very nice with the delta encoding we do on 
the {{localDeletionTime}}, so it's probably not the best of idea.
I'd probably go with option 1.


> Materialized Views Timestamp issues
> -----------------------------------
>
>                 Key: CASSANDRA-10261
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10261
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>             Fix For: 3.0.0 rc1
>
>
> As [~thobbs] 
> [mentioned|https://issues.apache.org/jira/browse/CASSANDRA-9664?focusedCommentId=14724150&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14724150]
>  in CASSANDRA-9664 there are issues dealing with updates to individual cells 
> which can mask data from the base table in the view when trying to filter 
> data correctly in the view.  
> Unfortunately, this same issue exists for all MV tables with regular columns.
> In the earlier versions of MV we did have a fix for this which I now can see 
> is ineffective for all situations.
> I've pushed some unit tests to show the issue (similar to tylers) and a fix.  
> The idea is we keep the base table's timestamps per cell as it so we can 
> *always* tell (per replica) which version of the record is the latest.  Since 
> the base table *always* writes the entire record to the view (part of our 
> earlier partial fix) we can ensure the view record contains *at least* views 
> primary key timestamp.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to