[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326907#comment-17326907
 ] 

Yifan Cai edited comment on CASSANDRA-16619 at 4/21/21, 9:41 PM:
-----------------------------------------------------------------

ZCS streams a file as-is and w/o loading it into memory, hence fast. To remove 
a field metadata, a node needs to load the file into memory when receiving from 
remote. 

I think it is an expected behavior with ZCS. 

To distinguish, adding the original hostID in the metadata sounds valid.

-- edit --

Talked with [~dcapwell] on slack. In the case of ZCS, the sstable metadata is 
updated after flushing the bytes. See 
[here.|https://github.com/apache/cassandra/blob/0fd8f0a52fbd69c47d073373abfe7d2437bbd9ca/src/java/org/apache/cassandra/db/streaming/CassandraEntireSSTableStreamReader.java#L142]
 Currently, it does not reset the commitLogInterval. But it is possible to just 
add a step to reset, and avoid updating the sstable format, i.e. add a new 
field. 


was (Author: yifanc):
ZCS streams a file as-is and w/o loading it into memory, hence fast. To remove 
a field metadata, a node needs to load the file into memory when receiving from 
remote. 

I think it is an expected behavior with ZCS. 

To distinguish, adding the original hostID in the metadata sounds valid.

> Loss of commit log data possible after sstable ingest
> -----------------------------------------------------
>
>                 Key: CASSANDRA-16619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to