[
https://issues.apache.org/jira/browse/HBASE-26999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534536#comment-17534536
]
Hudson commented on HBASE-26999:
--------------------------------
Results for branch master
[build #583 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/583/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(x) {color:red}-1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/583/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/583/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/583/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> HStore should try write WAL compaction marker before replacing compacted
> files in StoreEngine
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-26999
> URL: https://issues.apache.org/jira/browse/HBASE-26999
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> On HBASE-26064, it seems we altered the order we update different places with
> the results of a compaction:
> {noformat}
> @@ -1510,14 +1149,13 @@ public class HStore implements Store, HeapSize,
> StoreConfigInformation,
> List<Path> newFiles) throws IOException {
> // Do the steps necessary to complete the compaction.
> setStoragePolicyFromFileName(newFiles);
> - List<HStoreFile> sfs = commitStoreFiles(newFiles, true);
> + List<HStoreFile> sfs = storeEngine.commitStoreFiles(newFiles, true);
> if (this.getCoprocessorHost() != null) {
> for (HStoreFile sf : sfs) {
> getCoprocessorHost().postCompact(this, sf, cr.getTracker(), cr,
> user);
> }
> }
> - writeCompactionWalRecord(filesToCompact, sfs);
> - replaceStoreFiles(filesToCompact, sfs);
> + replaceStoreFiles(filesToCompact, sfs, true);
> ...
> @@ -1581,25 +1219,24 @@ public class HStore implements Store, HeapSize,
> StoreConfigInformation,
> this.region.getRegionInfo(), compactionDescriptor,
> this.region.getMVCC());
> }
>
> - void replaceStoreFiles(Collection<HStoreFile> compactedFiles,
> Collection<HStoreFile> result)
> - throws IOException {
> - this.lock.writeLock().lock();
> - try {
> -
> this.storeEngine.getStoreFileManager().addCompactionResults(compactedFiles,
> result);
> - synchronized (filesCompacting) {
> - filesCompacting.removeAll(compactedFiles);
> - }
> -
> - // These may be null when the RS is shutting down. The space quota
> Chores will fix the Region
> - // sizes later so it's not super-critical if we miss these.
> - RegionServerServices rsServices = region.getRegionServerServices();
> - if (rsServices != null &&
> rsServices.getRegionServerSpaceQuotaManager() != null) {
> - updateSpaceQuotaAfterFileReplacement(
> -
> rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(),
> getRegionInfo(),
> - compactedFiles, result);
> - }
> - } finally {
> - this.lock.writeLock().unlock();
> + @RestrictedApi(explanation = "Should only be called in TestHStore", link =
> "",
> + allowedOnPath = ".*/(HStore|TestHStore).java")
> + void replaceStoreFiles(Collection<HStoreFile> compactedFiles,
> Collection<HStoreFile> result,
> + boolean writeCompactionMarker) throws IOException {
> + storeEngine.replaceStoreFiles(compactedFiles, result);
> + if (writeCompactionMarker) {
> + writeCompactionWalRecord(compactedFiles, result);
> + }
> + synchronized (filesCompacting) {
> + filesCompacting.removeAll(compactedFiles);
> + }
> + // These may be null when the RS is shutting down. The space quota
> Chores will fix the Region
> + // sizes later so it's not super-critical if we miss these.
> + RegionServerServices rsServices = region.getRegionServerServices();
> + if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager()
> != null) {
> + updateSpaceQuotaAfterFileReplacement(
> + rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(),
> getRegionInfo(),
> + compactedFiles, result); {noformat}
> While running some large scale load test, we run into File SFT metafiles
> inconsistency that we believe could have been avoided if the original order
> was in place. Here the scenario we had:
> 1) Region R with one CF f was open on RS1. At this time, the given store had
> some files, let's say these were file1, file2 and file3;
> 2) Compaction started on RS1;
> 3) RS1 entered a long GC pause, lost ZK lock. Compaction is still running,
> though.
> 4) RS2 opens R. The related File SFT instance for this store then creates a
> new meta file with file1, file2 and file3.
> 5) Compaction on RS1 successfully completes the
> *storeEngine.replaceStoreFiles* call. This updates the in memory cache of
> valid files (StoreFileManager) and the SFT metafile for the store engine on
> RS1 with the compaction resulting file, say file4, removing file1, file2 and
> file3. Note that the SFT meta file used by RS1 here is different (older) than
> the one used by RS2.
> 6) Compaction on RS1 tries to update WAL marker, but fails to do so, as the
> WAL already got closed when the RS1 ZK lock expired. This triggers a store
> close in RS1. As part of the store close process, it removes all files it
> sees as completed compacted, in this case, file1, file2 and file3.
> 7) RS2 still references file1, file2 and file3. It then gets
> FileNotFoundException when trying to open any of these files.
> This situation would had been avoided if the original order of a) write WAL
> marker, then b) replace store files was kept.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)