[jira] [Commented] (HBASE-28696) BackupSystemTable can create huge delete batches that should be partitioned instead

Dieter De Paepe (Jira) Tue, 02 Jul 2024 09:10:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862489#comment-17862489
 ]


Dieter De Paepe commented on HBASE-28696:
-----------------------------------------

See also: HBASE-28706 (different problem, but concerns this code as well)

> BackupSystemTable can create huge delete batches that should be partitioned 
> instead
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-28696
>                 URL: https://issues.apache.org/jira/browse/HBASE-28696
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Ray Mattingly
>            Assignee: Ray Mattingly
>            Priority: Major
>
> When successfully taking an incremental backup, one of our final steps is to 
> delete bulk load metadata from the system table for the bulk loads that 
> needed to be captured in the given backup. This means that we will basically 
> truncate the entire bulk loads system table in a single batch of the deletes 
> after successfully taking an incremental backup. This logic occurs in 
> {{{}BackupSystemTable#deleteBulkLoadedRows{}}}:
> {code:java}
> /*
>  * Removes rows recording bulk loaded hfiles from backup table
>  * @param lst list of table names
>  * @param rows the rows to be deleted
>  */
> public void deleteBulkLoadedRows(List<byte[]> rows) throws IOException {
>   try (Table table = connection.getTable(bulkLoadTableName)) {
>     List<Delete> lstDels = new ArrayList<>();
>     for (byte[] row : rows) {
>       Delete del = new Delete(row);
>       lstDels.add(del);
>       LOG.debug("orig deleting the row: " + Bytes.toString(row));
>     }
>     table.delete(lstDels);
>     LOG.debug("deleted " + rows.size() + " original bulkload rows");
>   }
> } {code}
> Depending on your usage, one may run tons of bulk loads between backups, so 
> this design is needlessly fragile. We should partition these deletes so that 
> we never erroneously fail a backup due to this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28696) BackupSystemTable can create huge delete batches that should be partitioned instead

Reply via email to