[ https://issues.apache.org/jira/browse/HBASE-29197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hernan Gelaf-Romer updated HBASE-29197: --------------------------------------- Description: At my company, we're experimenting with the new incremental backup system. We've experienced issues deleting large number of bulkloaded rows from the system table if when exceeding the batch limit {quote} 2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last exception=java.io.IOException: java.io.IOException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)?? ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)?? ?? at org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)?? ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)?? ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)?? Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500 ?? at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)?? ?? at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)?? ?? at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)?? ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)?? ?? ... 4 more?? ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 – final attempt!?? 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 actions: IOException: 75776 times, servers with issues: na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 75776 actions: IOException: 75776 times, servers with issues: na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259 ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)?? ?? at org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)?? ?? at com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)?? ?? at com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)?? ?? at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)?? ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)?? ?? at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)?? ?? at com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)?? ?? at com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)?? ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)?? ?? at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)?? ?? at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)?? ?? at java.base/java.lang.Thread.run(Thread.java:1583)?? ?? Suppressed: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 6144 actions: IOException: 6144 times, servers with issues: na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)?? We should split these batches up into chunks so they don't cause issues {quote} was: At my company, we're experimenting with the new incremental backup system. We've experienced issues deleting large number of bulkloaded rows from the system table if when exceeding the batch limit 2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last exception=java.io.IOException: java.io.IOException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)?? ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)?? ?? at org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)?? ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)?? ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)?? Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500 ?? at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)?? ?? at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)?? ?? at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)?? ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)?? ?? ... 4 more?? ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 – final attempt!?? 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 actions: IOException: 75776 times, servers with issues: na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 75776 actions: IOException: 75776 times, servers with issues: na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259 ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)?? ?? at org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)?? ?? at com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)?? ?? at com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)?? ?? at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)?? ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)?? ?? at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)?? ?? at com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)?? ?? at com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)?? ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)?? ?? at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)?? ?? at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)?? ?? at java.base/java.lang.Thread.run(Thread.java:1583)?? ?? Suppressed: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 6144 actions: IOException: 6144 times, servers with issues: na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)?? We should split these batches up into chunks so they don't cause issues > Deleting bulk loaded rows from the backup system table can result in large > batch rejections failures > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-29197 > URL: https://issues.apache.org/jira/browse/HBASE-29197 > Project: HBase > Issue Type: Bug > Components: backup&restore > Reporter: Hernan Gelaf-Romer > Priority: Major > > At my company, we're experimenting with the new incremental backup system. > We've experienced issues deleting large number of bulkloaded rows from the > system table if when exceeding the batch limit > {quote} > 2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl > - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last > exception=java.io.IOException: java.io.IOException: Rejecting large batch > operation for current batch with firstRegionName: > backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , > Requested Number of Rows: 2048 , Size Threshold: 1500 > ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)?? > ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)?? > ?? at > org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)?? > ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)?? > ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)?? > Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: > Rejecting large batch operation for current batch with firstRegionName: > backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , > Requested Number of Rows: 2048 , Size Threshold: 1500 > ?? at > org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)?? > ?? at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)?? > ?? at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)?? > ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)?? > ?? ... 4 more?? > ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, > tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 – > final attempt!?? > 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR > o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 > actions: IOException: 75776 times, servers with issues: > na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, > na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259 > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed > 75776 actions: IOException: 75776 times, servers with issues: > na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, > na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259 > ?? at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? > ?? at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? > ?? at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)?? > ?? at > org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)?? > ?? at > org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)?? > ?? at > org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)?? > ?? at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)?? > ?? at > com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)?? > ?? at > com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)?? > ?? at > com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)?? > ?? at > java.base/java.security.AccessController.doPrivileged(AccessController.java:714)?? > ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)?? > ?? at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)?? > ?? at > com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)?? > ?? at > com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)?? > ?? at > com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)?? > ?? at > com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)?? > ?? at > com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)?? > ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)?? > ?? at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)?? > ?? at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)?? > ?? at java.base/java.lang.Thread.run(Thread.java:1583)?? > ?? Suppressed: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed > 6144 actions: IOException: 6144 times, servers with issues: > na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259?? > ?? at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? > ?? at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? > ?? at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)?? > ?? at > org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)?? > > We should split these batches up into chunks so they don't cause issues > > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)