[
https://issues.apache.org/jira/browse/HBASE-26999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532269#comment-17532269
]
Wellington Chevreuil commented on HBASE-26999:
----------------------------------------------
{quote}
The reason to change the order is because when writing WAL, we need to make
sure that the compaction is succeeded. And with SFT enabled, since it will have
extra IOs, we need to make sure we have successfully update the SFT record,
then we can write the compaction marker to WAL. This is very very important,
otherwise, when recovering, we may find a compaction marker in the WAL to
indicate that the compaction is succeeded, but while loading store file list,
we will not load the newly generated files. This may cause serious bugs too,
now or future.
{quote}
Makes sense. I guess both ways are problematic. But maybe we are more likely to
fail at wal marker writing time?
{quote}
I still stand my point that, the actual problem here, is we still allow the
dead RS to change the hfiles on HDFS.
{quote}
Agree it would be the ideal. It's very challenging, though, to guarantee that
all threads changing file system state would be interrupted upon a detection
that RS is aborting. The store may be closing separately to the compaction.
There's also the compacted files discharger chore running in the background.
As an alternative for this immediate problem, we may keep the current order:
update SFT, update SFM, write wal marker, but rollback SFM update in case of
errors on write wal marker?
> HStore should try write WAL compaction marker before replacing compacted
> files in StoreEngine
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-26999
> URL: https://issues.apache.org/jira/browse/HBASE-26999
> Project: HBase
> Issue Type: Sub-task
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
>
> On HBASE-26064, it seems we altered the order we update different places with
> the results of a compaction:
> {noformat}
> @@ -1510,14 +1149,13 @@ public class HStore implements Store, HeapSize,
> StoreConfigInformation,
> List<Path> newFiles) throws IOException {
> // Do the steps necessary to complete the compaction.
> setStoragePolicyFromFileName(newFiles);
> - List<HStoreFile> sfs = commitStoreFiles(newFiles, true);
> + List<HStoreFile> sfs = storeEngine.commitStoreFiles(newFiles, true);
> if (this.getCoprocessorHost() != null) {
> for (HStoreFile sf : sfs) {
> getCoprocessorHost().postCompact(this, sf, cr.getTracker(), cr,
> user);
> }
> }
> - writeCompactionWalRecord(filesToCompact, sfs);
> - replaceStoreFiles(filesToCompact, sfs);
> + replaceStoreFiles(filesToCompact, sfs, true);
> ...
> @@ -1581,25 +1219,24 @@ public class HStore implements Store, HeapSize,
> StoreConfigInformation,
> this.region.getRegionInfo(), compactionDescriptor,
> this.region.getMVCC());
> }
>
> - void replaceStoreFiles(Collection<HStoreFile> compactedFiles,
> Collection<HStoreFile> result)
> - throws IOException {
> - this.lock.writeLock().lock();
> - try {
> -
> this.storeEngine.getStoreFileManager().addCompactionResults(compactedFiles,
> result);
> - synchronized (filesCompacting) {
> - filesCompacting.removeAll(compactedFiles);
> - }
> -
> - // These may be null when the RS is shutting down. The space quota
> Chores will fix the Region
> - // sizes later so it's not super-critical if we miss these.
> - RegionServerServices rsServices = region.getRegionServerServices();
> - if (rsServices != null &&
> rsServices.getRegionServerSpaceQuotaManager() != null) {
> - updateSpaceQuotaAfterFileReplacement(
> -
> rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(),
> getRegionInfo(),
> - compactedFiles, result);
> - }
> - } finally {
> - this.lock.writeLock().unlock();
> + @RestrictedApi(explanation = "Should only be called in TestHStore", link =
> "",
> + allowedOnPath = ".*/(HStore|TestHStore).java")
> + void replaceStoreFiles(Collection<HStoreFile> compactedFiles,
> Collection<HStoreFile> result,
> + boolean writeCompactionMarker) throws IOException {
> + storeEngine.replaceStoreFiles(compactedFiles, result);
> + if (writeCompactionMarker) {
> + writeCompactionWalRecord(compactedFiles, result);
> + }
> + synchronized (filesCompacting) {
> + filesCompacting.removeAll(compactedFiles);
> + }
> + // These may be null when the RS is shutting down. The space quota
> Chores will fix the Region
> + // sizes later so it's not super-critical if we miss these.
> + RegionServerServices rsServices = region.getRegionServerServices();
> + if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager()
> != null) {
> + updateSpaceQuotaAfterFileReplacement(
> + rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(),
> getRegionInfo(),
> + compactedFiles, result); {noformat}
> While running some large scale load test, we run into File SFT metafiles
> inconsistency that we believe could have been avoided if the original order
> was in place. Here the scenario we had:
> 1) Region R with one CF f was open on RS1. At this time, the given store had
> some files, let's say these were file1, file2 and file3;
> 2) Compaction started on RS1;
> 3) RS1 entered a long GC pause, lost ZK lock. Compaction is still running,
> though.
> 4) RS2 opens R. The related File SFT instance for this store then creates a
> new meta file with file1, file2 and file3.
> 5) Compaction on RS1 successfully completes the
> *storeEngine.replaceStoreFiles* call. This updates the in memory cache of
> valid files (StoreFileManager) and the SFT metafile for the store engine on
> RS1 with the compaction resulting file, say file4, removing file1, file2 and
> file3. Note that the SFT meta file used by RS1 here is different (older) than
> the one used by RS2.
> 6) Compaction on RS1 tries to update WAL marker, but fails to do so, as the
> WAL already got closed when the RS1 ZK lock expired. This triggers a store
> close in RS1. As part of the store close process, it removes all files it
> sees as completed compacted, in this case, file1, file2 and file3.
> 7) RS2 still references file1, file2 and file3. It then gets
> FileNotFoundException when trying to open any of these files.
> This situation would had been avoided if the original order of a) write WAL
> marker, then b) replace store files was kept.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)