[jira] [Commented] (HBASE-29149) WAL files can be archived during incremental backup process

David (Jira) Wed, 03 Dec 2025 20:32:07 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-29149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042703#comment-18042703
 ]


David commented on HBASE-29149:
-------------------------------

I took a closer look at the issue and found several code locations that may be 
relevant.
 
The root cause is the lack of retry/archive-lookup logic in the backup client's 
WAL-to-HFile conversion ({{{}convertWALsToHFiles(){}}} invoked at line 290), 
combined with uncoordinated archiving by {{ProcedureWALFile.removeFile}} (lines 
164-165).
 
*Relevant area:*
 # Function {{removeFile}} in 
{{hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/ProcedureWALFile.java}}
 ([lines 
160-174|https://github.com/apache/hbase/blob/6d342cc2e0ca0a6f468aad635435cc835bdae7dc/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/ProcedureWALFile.java#L160]):
 Lines 164-165 move the WAL out of its original directory, racing with any 
readers.
 # Function {{execute}} in 
{{hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java}}
 ([lines 
306-318|https://github.com/apache/hbase/blob/6d342cc2e0ca0a6f468aad635435cc835bdae7dc/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java#L311]):
 Line 311 opens WALs from the original path. When the file was archived, this 
throws FNFE, caught at lines 314-315, aborting the backup. No retry or archive 
fallback logic present.

*Suggested approach:*
{code:java}
try {
     convertWALsToHFiles(); 
} catch (FileNotFoundException fnfe) {
     if (backupInfo.getLogArchiveDir() != null) {
         LOG.warn(\"WAL not found, retrying from archive\", fnfe);
         convertWALsToHFilesUsingArchive(backupInfo.getLogArchiveDir());
     } else {
         throw fnfe;
     }
}{code}
Happy to adjust this if I missed anything.

> WAL files can be archived during incremental backup process
> -----------------------------------------------------------
>
>                 Key: HBASE-29149
>                 URL: https://issues.apache.org/jira/browse/HBASE-29149
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hernan Gelaf-Romer
>            Assignee: Hernan Gelaf-Romer
>            Priority: Major
>
> At my job, we've run into FNFE issues when WAL files are archived as they are 
> being loaded to be converted into HFiles. When looking at the failure logs, 
> we can see that the WAL was loaded just after the archive had occurred 
> server-side. 
>  
> {quote}2025-02-24 17:10:34.333  [pool-124-thread-1] ERROR 
> o.a.h.h.b.impl.TableBackupClient - Unexpected exception in 
> incremental-backup: incremental copy backup_1740417014671File 
> hdfs://nestor-hb2-a-qa:8020/hbase/WALs/na1-purple-dizzy-antelope.iad03.hubinternal.net,60020,1739996267893/na1-purple-dizzy-antelope.iad03.hubinternal.net%2C60020%2C1739996267893.1740412909549
>  does not exist.
> java.io.FileNotFoundException: File 
> hdfs://nestor-hb2-a-qa:8020/hbase/WALs/na1-purple-dizzy-antelope.iad03.hubinternal.net,60020,1739996267893/na1-purple-dizzy-antelope.iad03.hubinternal.net%2C60020%2C1739996267893.1740412909549
>  does not exist.
> {quote}
>  
> {quote}2025-02-24 17:10:17.787 Archiving 
> hdfs://nestor-hb2-a-qa:8020/hbase/WALs/na1-purple-dizzy-antelope.iad03.hubinternal.net,60020,1739996267893/na1-purple-dizzy-antelope.iad03.hubinternal.net%2C60020%2C1739996267893.1740412909549
>  to 
> hdfs://nestor-hb2-a-qa:8020/hbase/oldWALs/na1-purple-dizzy-antelope.iad03.hubinternal.net%2C60020%2C1739996267893.1740412909549
> {quote}
>  
> We already handle a similar situation when loading bulkloads, and add a 
> re-try mechanism that checks the archive directory. We should probably do a 
> similar thing here



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29149) WAL files can be archived during incremental backup process

Reply via email to