[
https://issues.apache.org/jira/browse/HBASE-30218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Catherine Turner resolved HBASE-30218.
--------------------------------------
Resolution: Not A Problem
Resolving as not a problem. This was more of an issue in our own code as we
were using the client with an incomplete configuration (specifically with
hbase.rootdir being unset). Ideally, the lock would have still been released in
spite of this exception being thrown in cleanup, but there are ways around that
that it's easier to accommodate on our end.
> Backup repair permanently holds lock when FULL backup fails in REQUEST phase
> ----------------------------------------------------------------------------
>
> Key: HBASE-30218
> URL: https://issues.apache.org/jira/browse/HBASE-30218
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Reporter: Catherine Turner
> Assignee: Catherine Turner
> Priority: Major
> Fix For: 3.0.0, 2.6.5
>
>
> When a FULL backup fails while still in REQUEST phase (before ExportSnapshot
> ever runs), the repair path throws an exception and aborts before releasing
> the backup exclusive lock. The cluster is then permanently wedged: every
> subsequent backup attempt fails with "There is an active session already
> running", which triggers repair, which throws again. This is an unrecoverable
> loop that cannot be broken without manual intervention (clearing the lock by
> hand).
> The root cause is that TableBackupClient.cleanupExportSnapshotLog attempts to
> construct a FileSystem from the hbase.rootdir property. When that property is
> unset, constructing new Path(null) within the getCurrentFileSystem method
> throws an IllegalArgumentException.
> For a backup that never progressed past REQUEST phase, ExportSnapshot was
> never invoked and there are no MapReduce log directories to clean up; the
> call to
> cleanupExportSnapshotLog should be a no-op. Instead, the unchecked exception
> escapes cleanupAndRestoreBackupSystem, which means the exclusive lock is
> never released.
> ----
> +Steps to reproduce+
> A full backup that stalls or is killed before the export phase leaves the
> session in REQUEST phase with Progress=0%:
> {noformat}
> hbase backup history
> {ID=backup_1780142338094, Type=FULL, Tables={...}, State=RUNNING,
> Start time=..., Phase=REQUEST, Progress=0%}
> {noformat}
> The exclusive lock is still held. Every subsequent backup run fails and the
> end-of-run repair throws:
> {noformat}
> ERROR o.a.h.h.backup.impl.BackupAdminImpl There is an active session
> already running
> ...
> ERROR ...BackupRepair Failed to run backup repair
> java.lang.IllegalArgumentException: Can not create a Path from a null string
> at
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.cleanupExportSnapshotLog(TableBackupClient.java:169)
> at
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.cleanupAndRestoreBackupSystem(TableBackupClient.java:270)
> at ...
> {noformat}
> ----
> +To resolve this issue...+
> * cleanupAndRestoreBackupSystem should ensure the backup exclusive lock is
> released in a finally block so that even an unexpected exception in cleanup
> cannot leave the lock permanently held.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)