[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326943#comment-17326943
 ] 

David Capwell commented on CASSANDRA-16619:
-------------------------------------------

A backup/restore process which bypasses nodetool import and directly dumps the 
files in the CF directory makes sense to hit this, but if you go through import 
I would hope we strip out all the metadata which is no longer relevant (which 
we are trying to do in import as commit log position isn't the only thing we 
need to deal with).  If we special case commit log, what do we do with other 
things such as repair, ancestry, level, etc?  

Since the cases which load SStables from external writers are few and well 
known, I feel it makes the most sense to make sure each strips the metadata the 
same way. Adding a method to MetadataSerializer such as resetCommitLogPosition 
and calling it in the places which import files would handle this without 
requiring a format change (import allows more flexibility in what we strip out, 
which backup/restore processes can use.  So nice to have this method rather 
than a resetNonLocalMetadata method).

> Loss of commit log data possible after sstable ingest
> -----------------------------------------------------
>
>                 Key: CASSANDRA-16619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to