[
https://issues.apache.org/jira/browse/HBASE-27579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679480#comment-17679480
]
Hudson commented on HBASE-27579:
--------------------------------
Results for branch master
[build #760 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/760/]:
(/) *{color:green}+1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/760/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/760/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/760/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> CatalogJanitor can cause data loss due to errors during cleanMergeRegion
> ------------------------------------------------------------------------
>
> Key: HBASE-27579
> URL: https://issues.apache.org/jira/browse/HBASE-27579
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Blocker
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.16, 2.5.3
>
>
> In CatalogJanitor.cleanMergeRegion, there is the following check:
> {code:java}
> HRegionFileSystem regionFs = null;
> try {
> regionFs =
> HRegionFileSystem.openRegionFromFileSystem(this.services.getConfiguration(),
> fs,
> tabledir, mergedRegion, true);
> } catch (IOException e) {
> LOG.warn("Merged region does not exist: " + mergedRegion.getEncodedName());
> }
> if (regionFs == null || !regionFs.hasReferences(htd)) {
> .. do the cleanup ..
> } {code}
>
> I think the assumption here is that an IOException would only be thrown if a
> region doesn't exist? We had a very poorly timed NameNode failover, during
> CatalogJanitor run, after a merge. The NameNode failover caused the
> openRegionFromFileSystem call to fail, which logged:
> {code:java}
> WARN org.apache.hadoop.hbase.master.janitor.CatalogJanitor: Merged region
> does not exist: 32c71224852c5a4b94a3ba271b4fcb15 {code}
> This region did in fact exist and had not fully compacted, so there were
> still some lingering reference files.
> The cleanup process moves the parent regions to the archive directory, but
> the default TTL for those files in the archive directory is only 5 minutes.
> After that they are cleaned up and the data is now unrecoverable.
> This resulted in FileNotFoundExceptions trying to read or open this region.
> Our only course of action was to move the lingering reference files aside, so
> the data is unrecoverable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)