[jira] [Commented] (HBASE-26999) HStore should try write WAL compaction marker before replacing compacted files in StoreEngine

Wellington Chevreuil (Jira) Mon, 09 May 2022 03:30:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533742#comment-17533742
 ]


Wellington Chevreuil commented on HBASE-26999:
----------------------------------------------

Thanks for reviewing this, [~apurtell] [~zhangduo] ! I had now merged this into 
master and branch-2.

Since this is a bug fix, I believe it should go into branch-2.5 as well, but 
want to confirm if this is fine and is not going sink your works related to the 
2.5.0 RC0 [~apurtell] ?

 

As for the reported issue above:
{quote}eventually all files in a small subset of stores were (erroneously) 
compacted away.
{quote}
And it didn't produce any valid, compaction result file?  

{quote}
I'm going to retest with this change applied before debugging further because 
it seems to be a race/timing issue related to compaction. Notably this was a 
test performed without active chaos. There were no server failures.
{quote}
Right, I'm not optimistic this fix would work for your case, [~apurtell]. I 
believe it only prevents the SFM "in-memory" cache from being updated in case 
RS is already aborting. One thing that I had noticed whilst troubleshooting 
this is that SFT has it's own in-memory cache of "active" files, and it gets 
updated on its own locking context, separately from the SFM "in-memory" cache. 
I didn't think it could create further problems by the way the updates of SFT 
and SFM are being called, but I might had missed something.

Another interesting behaviour noticed is that in the case of two StoreEngine 
instances of the same store, say, when we have the region re-opened in a new RS 
and the old RS still thinks it's active and the region is open, there will be 
two SFT meta files being updated. This current patch would prevent that 
situation from progressing to a point the rogue RS could cause problems, but 
could we have other scenarios where two StoreEngine instances of the same store 
are up?

> HStore should try write WAL compaction marker before replacing compacted 
> files in StoreEngine
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26999
>                 URL: https://issues.apache.org/jira/browse/HBASE-26999
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> On HBASE-26064, it seems we altered the order we update different places with 
> the results of a compaction:
> {noformat}
> @@ -1510,14 +1149,13 @@ public class HStore implements Store, HeapSize, 
> StoreConfigInformation,
>        List<Path> newFiles) throws IOException {
>      // Do the steps necessary to complete the compaction.
>      setStoragePolicyFromFileName(newFiles);
> -    List<HStoreFile> sfs = commitStoreFiles(newFiles, true);
> +    List<HStoreFile> sfs = storeEngine.commitStoreFiles(newFiles, true);
>      if (this.getCoprocessorHost() != null) {
>        for (HStoreFile sf : sfs) {
>          getCoprocessorHost().postCompact(this, sf, cr.getTracker(), cr, 
> user);
>        }
>      }
> -    writeCompactionWalRecord(filesToCompact, sfs);
> -    replaceStoreFiles(filesToCompact, sfs);
> +    replaceStoreFiles(filesToCompact, sfs, true);
> ...
> @@ -1581,25 +1219,24 @@ public class HStore implements Store, HeapSize, 
> StoreConfigInformation,
>          this.region.getRegionInfo(), compactionDescriptor, 
> this.region.getMVCC());
>    }
>  
> -  void replaceStoreFiles(Collection<HStoreFile> compactedFiles, 
> Collection<HStoreFile> result)
> -      throws IOException {
> -    this.lock.writeLock().lock();
> -    try {
> -      
> this.storeEngine.getStoreFileManager().addCompactionResults(compactedFiles, 
> result);
> -      synchronized (filesCompacting) {
> -        filesCompacting.removeAll(compactedFiles);
> -      }
> -
> -      // These may be null when the RS is shutting down. The space quota 
> Chores will fix the Region
> -      // sizes later so it's not super-critical if we miss these.
> -      RegionServerServices rsServices = region.getRegionServerServices();
> -      if (rsServices != null && 
> rsServices.getRegionServerSpaceQuotaManager() != null) {
> -        updateSpaceQuotaAfterFileReplacement(
> -            
> rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(), 
> getRegionInfo(),
> -            compactedFiles, result);
> -      }
> -    } finally {
> -      this.lock.writeLock().unlock();
> +  @RestrictedApi(explanation = "Should only be called in TestHStore", link = 
> "",
> +    allowedOnPath = ".*/(HStore|TestHStore).java")
> +  void replaceStoreFiles(Collection<HStoreFile> compactedFiles, 
> Collection<HStoreFile> result,
> +    boolean writeCompactionMarker) throws IOException {
> +    storeEngine.replaceStoreFiles(compactedFiles, result);
> +    if (writeCompactionMarker) {
> +      writeCompactionWalRecord(compactedFiles, result);
> +    }
> +    synchronized (filesCompacting) {
> +      filesCompacting.removeAll(compactedFiles);
> +    }
> +    // These may be null when the RS is shutting down. The space quota 
> Chores will fix the Region
> +    // sizes later so it's not super-critical if we miss these.
> +    RegionServerServices rsServices = region.getRegionServerServices();
> +    if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager() 
> != null) {
> +      updateSpaceQuotaAfterFileReplacement(
> +        rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(), 
> getRegionInfo(),
> +        compactedFiles, result); {noformat}
> While running some large scale load test, we run into File SFT metafiles 
> inconsistency that we believe could have been avoided if the original order 
> was in place. Here the scenario we had:
> 1) Region R with one CF f was open on RS1. At this time, the given store had 
> some files, let's say these were file1, file2 and file3;
> 2) Compaction started on RS1;
> 3) RS1 entered a long GC pause, lost ZK lock. Compaction is still running, 
> though.
> 4) RS2 opens R. The related File SFT instance for this store then creates a 
> new meta file with file1, file2 and file3.
> 5) Compaction on RS1 successfully completes the 
> *storeEngine.replaceStoreFiles* call. This updates the in memory cache of 
> valid files (StoreFileManager) and the SFT metafile for  the store engine on 
> RS1 with the compaction resulting file, say file4, removing file1, file2 and 
> file3. Note that the SFT meta file used by RS1 here is different (older) than 
> the one used by RS2.
> 6) Compaction on RS1 tries to update WAL marker, but fails to do so, as the 
> WAL already got closed when the RS1 ZK lock expired. This triggers a store 
> close in RS1. As part of the store close process, it removes all files it 
> sees as completed compacted, in this case, file1, file2 and file3.
> 7) RS2 still references file1, file2 and file3. It then gets 
> FileNotFoundException when trying to open any of these files.
> This situation would had been avoided if the original order of a) write WAL 
> marker, then b) replace store files was kept. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HBASE-26999) HStore should try write WAL compaction marker before replacing compacted files in StoreEngine

Reply via email to