[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503704#comment-17503704 ]
Duo Zhang commented on HBASE-26791: ----------------------------------- I've talked with [~elserj] on slack about this. If we always overwrite the same set of track files, I do not think there is a possible way to fix this problem. So I propose we solve the problem in this way: 1. Include a timestamp/sequenceid in the track file name, which means when opening a region, we need to list the track file directory(sad) to find the newest track file and load it. 2. To avoid generating too many track files, we only need to bump the timestamp/sequenceid when opening a region. So the open region steps will be: a. List the track file directory, load the newest track file. If there are two files with the same timestamp/sequenceid, then comparing the timestamp store in the file content, just as what we have done before. b. Bump the timestamp/sequenceid, to a value greater than the loaded timestamp/sequenceid, and we will use this timestamp/sequenceid as new track file names. In this way, the old rs will only overwrite the track files with old timestamp/sequenceid, so it will not effect the new track files. So the problem can be solved. Notice that, the track file name will be simething like f1-12345.fileslist and f2-12345.filelist. > Memstore flush fencing issue for SFT > ------------------------------------ > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug > Affects Versions: 2.6.0, 3.0.0-alpha-3 > Reporter: Szabolcs Bukros > Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)