[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327149#comment-17327149
 ] 

Branimir Lambov commented on CASSANDRA-16619:
---------------------------------------------

{quote}what do we do with other things such as repair, ancestry, level, etc?
{quote}
With this ticket, we _have_ the originating host id, so we have the means to 
ignore non-relevant information, whether it is in commit log, compaction or 
anywhere.

There's some room to make the interface more generic, i.e. have a mechanism to 
mark fields as local so that they can be properly combined when doing 
compaction (which can easily be done in a separate ticket), but this IMHO is a 
better solution to the problem as it handles all manners of transfer and also 
allows correcting errors caused by tables already transferred by the time a bug 
with local metadata is uncovered.

> Loss of commit log data possible after sstable ingest
> -----------------------------------------------------
>
>                 Key: CASSANDRA-16619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to