[ https://issues.apache.org/jira/browse/HBASE-28643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Dimiduk resolved HBASE-28643. ---------------------------------- Resolution: Fixed Pushed to branch-2.6+ . Thanks [~rmdmattingly] ! > An unbounded backup failure message can cause an irrecoverable state for the > given backup > ----------------------------------------------------------------------------------------- > > Key: HBASE-28643 > URL: https://issues.apache.org/jira/browse/HBASE-28643 > Project: HBase > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Ray Mattingly > Assignee: Ray Mattingly > Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > > The BackupInfo class has a failedMsg field which is a string of unbounded > length. When a DistCp job fails then its failure message contains all of its > source paths, and its failure message gets propagated to this failedMsg field > on the given BackupInfo. > If a DistCp job has enough source paths, then this will result in backup > status updates being rejected: > {noformat} > java.lang.IllegalArgumentException: KeyValue size too large > at > org.apache.hadoop.hbase.client.ConnectionUtils.validatePut(ConnectionUtils.java:513) > at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1095) > at org.apache.hadoop.hbase.client.HTable.lambda$put$3(HTable.java:564) > at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:563) > at > org.apache.hadoop.hbase.backup.impl.BackupSystemTable.updateBackupInfo(BackupSystemTable.java:292) > at > org.apache.hadoop.hbase.backup.impl.BackupManager.updateBackupInfo(BackupManager.java:376) > at > org.apache.hadoop.hbase.backup.impl.TableBackupClient.failBackup(TableBackupClient.java:243) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:317) > at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:603) > at > com.hubspot.hbase.recovery.core.backup.BackupManager.lambda$runBackups$2(BackupManager.java:145){noformat} > Without the ability to update the backup's state, it will never be returned > as a failed backup by the client. This means that any mechanisms designed for > repairing or cleaning failed backups won't work properly. > I think that a simple fix here would be fine: we should truncate the > failedMsg field to a reasonable maximum size. -- This message was sent by Atlassian Jira (v8.20.10#820010)