[
https://issues.apache.org/jira/browse/HBASE-26836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540899#comment-17540899
]
Wellington Chevreuil commented on HBASE-26836:
----------------------------------------------
Yes, we are going over this whilst running some test sets with pre-existing
snapshotted data. The options I see here are:
1) We leave as this, documenting this limitation and that MIGRATION cluster
should be used then to convert it to FILE, if that's desired. The risk here is
if cluster doesn't have HBOSS enabled and this step is ignored, there might be
problems for the table.
2) We add extra checks for the global SFT config, and if no SFT is defined in
the snapshot, we set the MIGRATION accordingly. Question then is when
should/would we change SFT config from MIGRATION to FILE? Add an extra stage on
the restore snapshot procedure, or should we just leave it for manual change?
> Should always set SFT implementation when cloning snapshot
> ----------------------------------------------------------
>
> Key: HBASE-26836
> URL: https://issues.apache.org/jira/browse/HBASE-26836
> Project: HBase
> Issue Type: Sub-task
> Components: HFile, snapshots
> Reporter: Duo Zhang
> Priority: Major
>
> Saw the TestCloneSnapshotProcedureFileBasedSFT failing several times
> {noformat}
> 2022-03-14T11:23:13,782 INFO [PEWorker-1]
> procedure2.ProcedureExecutor(1432): Finished pid=99, state=SUCCESS,
> hasLock=false; CloneSnapshotProcedure (table=testRecoverWithRestoreAclFlag
> snapshot=name: "snapshot-1647256973399"
> table: "testCloneSnapshot"
> creation_time: 1647256982366
> type: FLUSH
> version: 2
> owner: ""
> ttl: 0
> max_file_size: 0
> ) in 6.9090 sec
> 2022-03-14T11:23:13,794 WARN [PEWorker-1]
> procedure2.ProcedureExecutor$Testing(127): Toggle KILL before store update
> to: false
> 2022-03-14T11:23:13,794 DEBUG [PEWorker-1]
> procedure2.ProcedureExecutor(1777): TESTING: Kill BEFORE store update:
> pid=112, state=RUNNABLE:MODIFY_TABLE_DESCRIPTOR_UPDATE, hasLock=true;
> InitializeStoreFileTrackerProcedure table=testRecoverWithRestoreAclFlag
> 2022-03-14T11:23:13,794 INFO [PEWorker-1] procedure2.ProcedureExecutor(635):
> Stopping
> 2022-03-14T11:23:13,795 WARN [PEWorker-1]
> procedure2.ProcedureExecutor$WorkerThread(1997): Worker terminating
> UNNATURALLY null
> java.lang.RuntimeException: TESTING: Kill BEFORE store update: pid=112,
> state=RUNNABLE:MODIFY_TABLE_DESCRIPTOR_UPDATE, hasLock=true;
> InitializeStoreFileTrackerProcedure table=testRecoverWithRestoreAclFlag
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.kill(ProcedureExecutor.java:1779)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> ~[classes/:?]
> 2022-03-14T11:23:14,012 WARN [Time-limited test]
> procedure2.ProcedureTestingUtility(193): Set Kill before store update to:
> false
> {noformat}
> The CloneSnapshotProcedure is finished but then we get a
> InitializeStoreFileTrackerProcedure which messes up the test.
> The InitializeStoreFileTrackerProcedure will be scheduled when rolling
> upgrade, where we do not have SFT set for a table. So typically it should not
> be schedule. Not sure how this could happen in the UT, need to dig more.
> But anyway, when we clone a snapshot which was taken before we have SFT, it
> is possible the TableDescriptor does not have SFT implementation set, so we
> should set one for it.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)