[ 
https://issues.apache.org/jira/browse/HBASE-29197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hernan Gelaf-Romer updated HBASE-29197:
---------------------------------------
    Description: 
At my company, we're experimenting with the new incremental backup system. 
We've experienced issues deleting large number of bulkloaded rows from the 
system table if when exceeding the batch limit
{quote} 
2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl - 
id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last 
exception=java.io.IOException: java.io.IOException: Rejecting large batch 
operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500
 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
 ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
 ?? at 
org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
 ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
 ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
 Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
Rejecting large batch operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500
 ?? at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
 ?? at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
 ?? at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
 ?? ... 4 more??
 ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, 
tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 – 
final attempt!??
 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR 
o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 
actions: IOException: 75776 times, servers with issues: 
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
75776 actions: IOException: 75776 times, servers with issues: 
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
 ?? at 
com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
 ?? at 
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
 ?? at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
 ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
 ?? at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
 ?? at 
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
 ?? at 
com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
 ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
 ?? at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
 ?? at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
 ?? at java.base/java.lang.Thread.run(Thread.java:1583)??
 ?? Suppressed: 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
6144 actions: IOException: 6144 times, servers with issues: 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
 
We should split these batches up into chunks so they don't cause issues
 
{quote}

  was:
At my company, we're experimenting with the new incremental backup system. 
We've experienced issues deleting large number of bulkloaded rows from the 
system table if when exceeding the batch limit

 
2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl - 
id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last 
exception=java.io.IOException: java.io.IOException: Rejecting large batch 
operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500
 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
 ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
 ?? at 
org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
 ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
 ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
 Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
Rejecting large batch operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500
 ?? at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
 ?? at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
 ?? at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
 ?? ... 4 more??
 ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, 
tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 – 
final attempt!??
 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR 
o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 
actions: IOException: 75776 times, servers with issues: 
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
75776 actions: IOException: 75776 times, servers with issues: 
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
 ?? at 
com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
 ?? at 
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
 ?? at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
 ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
 ?? at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
 ?? at 
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
 ?? at 
com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
 ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
 ?? at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
 ?? at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
 ?? at java.base/java.lang.Thread.run(Thread.java:1583)??
 ?? Suppressed: 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
6144 actions: IOException: 6144 times, servers with issues: 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
 
We should split these batches up into chunks so they don't cause issues
 


> Deleting bulk loaded rows from the backup system table can result in large 
> batch rejections failures
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29197
>                 URL: https://issues.apache.org/jira/browse/HBASE-29197
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>            Reporter: Hernan Gelaf-Romer
>            Priority: Major
>
> At my company, we're experimenting with the new incremental backup system. 
> We've experienced issues deleting large number of bulkloaded rows from the 
> system table if when exceeding the batch limit
> {quote} 
> 2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl 
> - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last 
> exception=java.io.IOException: java.io.IOException: Rejecting large batch 
> operation for current batch with firstRegionName: 
> backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , 
> Requested Number of Rows: 2048 , Size Threshold: 1500
>  ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
>  ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
>  ?? at 
> org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
>  ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
>  ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
>  Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
> Rejecting large batch operation for current batch with firstRegionName: 
> backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , 
> Requested Number of Rows: 2048 , Size Threshold: 1500
>  ?? at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
>  ?? at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
>  ?? at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
>  ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
>  ?? ... 4 more??
>  ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, 
> tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 – 
> final attempt!??
>  2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR 
> o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 
> actions: IOException: 75776 times, servers with issues: 
> na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
>  org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
> 75776 actions: IOException: 75776 times, servers with issues: 
> na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
>  ?? at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
>  ?? at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
>  ?? at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
>  ?? at 
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
>  ?? at 
> org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
>  ?? at 
> org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
>  ?? at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
>  ?? at 
> com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
>  ?? at 
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
>  ?? at 
> com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
>  ?? at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
>  ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
>  ?? at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
>  ?? at 
> com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
>  ?? at 
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
>  ?? at 
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
>  ?? at 
> com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
>  ?? at 
> com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
>  ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
>  ?? at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
>  ?? at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
>  ?? at java.base/java.lang.Thread.run(Thread.java:1583)??
>  ?? Suppressed: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
> 6144 actions: IOException: 6144 times, servers with issues: 
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
>  ?? at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
>  ?? at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
>  ?? at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
>  ?? at 
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
>  
> We should split these batches up into chunks so they don't cause issues
>  
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to