[
https://issues.apache.org/jira/browse/HBASE-28643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853158#comment-17853158
]
Nick Dimiduk commented on HBASE-28643:
--------------------------------------
I agree that the failure message is probably fine to truncate. Stepping back,
this BackupInfo context object looks like it contains a lot of data -- files
paths and such. I'm concerned that it's not reasonable to store all this in a
single cell. We should look into splitting this out into multiple cells and
serializing to/from a row instead.
> An unbounded backup failure message can cause an irrecoverable state for the
> given backup
> -----------------------------------------------------------------------------------------
>
> Key: HBASE-28643
> URL: https://issues.apache.org/jira/browse/HBASE-28643
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Ray Mattingly
> Assignee: Ray Mattingly
> Priority: Major
>
> The BackupInfo class has a failedMsg field which is a string of unbounded
> length. When a DistCp job fails then its failure message contains all of its
> source paths, and its failure message gets propagated to this failedMsg field
> on the given BackupInfo.
> If a DistCp job has enough source paths, then this will result in backup
> status updates being rejected:
> {noformat}
> java.lang.IllegalArgumentException: KeyValue size too large
> at
> org.apache.hadoop.hbase.client.ConnectionUtils.validatePut(ConnectionUtils.java:513)
> at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1095)
> at org.apache.hadoop.hbase.client.HTable.lambda$put$3(HTable.java:564)
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:563)
> at
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.updateBackupInfo(BackupSystemTable.java:292)
> at
> org.apache.hadoop.hbase.backup.impl.BackupManager.updateBackupInfo(BackupManager.java:376)
> at
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.failBackup(TableBackupClient.java:243)
> at
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:317)
> at
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:603)
> at
> com.hubspot.hbase.recovery.core.backup.BackupManager.lambda$runBackups$2(BackupManager.java:145){noformat}
> Without the ability to update the backup's state, it will never be returned
> as a failed backup by the client. This means that any mechanisms designed for
> repairing or cleaning failed backups won't work properly.
> I think that a simple fix here would be fine: we should truncate the
> failedMsg field to a reasonable maximum size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)