Dieter De Paepe created HBASE-28103:
---------------------------------------
Summary: HBase backup repair stuck after failed delete due to
missing S3 credentials
Key: HBASE-28103
URL: https://issues.apache.org/jira/browse/HBASE-28103
Project: HBase
Issue Type: Bug
Reporter: Dieter De Paepe
I was experimenting what happens if a user were to execute `hbase backupe
delete` without providing S3 credentials.
I started with a backup present in a S3 bucket.
{noformat}
hbase backup history
{ID=backup_1695226626227,Type=FULL,Tables={foo:bar},State=COMPLETE,Start
time=Wed Sep 20 16:17:09 UTC 2023,End time=Wed Sep 20 16:17:42 UTC
2023,Progress=100%}
{noformat}
I tried to delete this without providing S3 credentials, it failed (as
expected).
{noformat}
hbase backup delete -l backup_1695226626227
23/09/20 16:18:46 ERROR org.apache.hadoop.hbase.backup.impl.BackupAdminImpl:
Delete operation failed, please run backup repair utility to restore backup
system integrity
java.nio.file.AccessDeniedException:
s3a://backuprestore-experiments/hbase/backup_1695226626227:
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider :
com.amazonaws.SdkClientException: Unable to load AWS credentials from
environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY
(or AWS_SECRET_ACCESS_KEY))
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:215)
at org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture(Invoker.java:190)
at
org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.next(Listing.java:651)
at
org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.requestNextBatch(Listing.java:430)
at
org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.<init>(Listing.java:372)
at
org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:143)
at
org.apache.hadoop.fs.s3a.Listing.getFileStatusesAssumingNonEmptyDir(Listing.java:264)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:3369)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$22(S3AFileSystem.java:3346)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$23(S3AFileSystem.java:3345)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3344)
at
org.apache.hadoop.hbase.backup.util.BackupUtils.listStatus(BackupUtils.java:522)
at
org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupHLogDir(BackupUtils.java:430)
at
org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupBackupData(BackupUtils.java:411)
at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackup(BackupAdminImpl.java:229)
at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackups(BackupAdminImpl.java:142)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.executeDeleteListOfBackups(BackupCommands.java:627)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.execute(BackupCommands.java:578)
at
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS
Credentials provided by TemporaryAWSCredentialsProvider
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider
IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to
load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or
AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:216)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
at
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6432)
at
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6404)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5441)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5397)
at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$12(S3AFileSystem.java:2715)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:431)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2706)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:2342)
at
org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:87)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials
from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and
AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at
com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:49)
at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
... 28 more
Delete command FAILED. Please run backup repair tool to restore backup system
integrity
23/09/20 16:18:46 ERROR org.apache.hadoop.hbase.backup.BackupDriver: Error
running command-line tool
java.nio.file.AccessDeniedException:
s3a://backuprestore-experiments/hbase/backup_1695226626227:
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider :
com.amazonaws.SdkClientException: Unable to load AWS credentials from
environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY
(or AWS_SECRET_ACCESS_KEY))
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:215)
at org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture(Invoker.java:190)
at
org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.next(Listing.java:651)
at
org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.requestNextBatch(Listing.java:430)
at
org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.<init>(Listing.java:372)
at
org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:143)
at
org.apache.hadoop.fs.s3a.Listing.getFileStatusesAssumingNonEmptyDir(Listing.java:264)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:3369)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$22(S3AFileSystem.java:3346)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$23(S3AFileSystem.java:3345)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3344)
at
org.apache.hadoop.hbase.backup.util.BackupUtils.listStatus(BackupUtils.java:522)
at
org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupHLogDir(BackupUtils.java:430)
at
org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupBackupData(BackupUtils.java:411)
at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackup(BackupAdminImpl.java:229)
at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackups(BackupAdminImpl.java:142)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.executeDeleteListOfBackups(BackupCommands.java:627)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.execute(BackupCommands.java:578)
at
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS
Credentials provided by TemporaryAWSCredentialsProvider
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider
IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to
load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or
AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:216)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
at
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6432)
at
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6404)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5441)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5397)
at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$12(S3AFileSystem.java:2715)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:431)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2706)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:2342)
at
org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:87)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials
from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and
AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at
com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:49)
at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
... 28 more
{noformat}
At this point, I cannot start a new backup because a failed delete command is
present:
{noformat}
hbase backup \
-libjars
/opt/hadoop/share/hadoop/tools/lib/hadoop-aws-3.3.6-1-lily.jar,/opt/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.367.jar
\
-Dfs.s3a.access.key=... \
-Dfs.s3a.secret.key=... \
-Dfs.s3a.session.token=... \
create incremental s3a://backuprestore-experiments/hbase -t foo:bar
Found failed backup DELETE coommand.
Backup system recovery is required.
23/09/20 16:31:16 ERROR org.apache.hadoop.hbase.backup.BackupDriver: Error
running command-line tool
java.io.IOException: Failed backup DELETE found, aborted command execution
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$Command.execute(BackupCommands.java:167)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:309)
at
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
{noformat}
However, backup is unable to complete.
{noformat}
hbase backup repair
REPAIR status: no failed sessions found. Checking failed delete backup
operation ...
Found failed DELETE operation for: backup_1695226626227
Running DELETE again ...
23/09/20 16:34:13 WARN org.apache.hadoop.hbase.backup.impl.BackupSystemTable:
Could not restore backup system table. Snapshot snapshot_backup_system does not
exists.
23/09/20 16:34:13 ERROR org.apache.hadoop.hbase.backup.BackupDriver: Error
running command-line tool
java.io.IOException: There is no active backup exclusive operation
at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.finishBackupExclusiveOperation(BackupSystemTable.java:645)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$RepairCommand.repairFailedBackupDeletionIfAny(BackupCommands.java:721)
at
org.apache.hadoop.hbase.backup.impl.BackupCommands$RepairCommand.execute(BackupCommands.java:681)
at
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
{noformat}
The core issue seems to be the assumption that there is a "backup exclusive
operation" for each failed delete command.
A good feature would also be to allow the repair command to delete the pending
delete. Though I guess that in some cases that may not result in a reliable
state if data was already partially deleted.
The workaround in this case would be to delete the delete commands from the
backup table I guess?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)