[jira] [Commented] (HBASE-29519) Copy Bulkloaded Files in Continuous Backup

Vinayak Hegde (Jira) Fri, 15 Aug 2025 00:28:45 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014056#comment-18014056
 ]


Vinayak Hegde commented on HBASE-29519:
---------------------------------------

 
{code:java}
if hbase.replication.bulkload.enabled is disable, can one still use PITR / 
restore? {code}
Initially, I thought we could allow it, but now I believe we {*}shouldn’t{*}. 
Without this setting enabled, bulk-loaded files can’t be restored, and the user 
would lose that data.

I think we should enforce this in the continuous backup setup. If 
{{hbase.replication.bulkload.enabled}} is not enabled, the setup should fail 
with a clear message. This will prevent users from unknowingly proceeding and 
later being unable to restore bulk-loaded files.

To make this more user-friendly, we could introduce another configuration, for 
example:
{{hbase.backup.continuous.bulkloadfiles.backup.enabled}}

Possible values:
 * *true* – Copy bulk-loaded files.

 ** If {{hbase.replication.bulkload.enabled}} is true, proceed.

 ** Else, fail with an error stating that 
{{hbase.replication.bulkload.enabled}} must be enabled to back up bulk-loaded 
files.

 * *false* – Skip bulk-loaded files backup.

 * *not specified* – Fail the operation, requiring the user to make an explicit 
choice.

 

Few Important things to consider
 * {{hbase.replication.bulkload.enabled}} is {*}global{*}, so enabling it will 
replicate bulk-loaded files for all tables and affect all replication peers 
(e.g., peer1, peer2 will also receive them).

 * If bulk-loaded file backup was not initially chosen, those files won’t be 
available later. Enabling the configuration later won’t recover past data.

 * If the user realizes the mistake, the only option is to take a full backup 
after enabling the setting, so future bulk-loaded files are backed up.

Please share your thoughts on this, as it’s a critical decision. [~andor] 
[~wchevreuil] [~sergey.soldatov] [~ankit] [~taklwu] .

 

 
{code:java}
also, this may be a different task, I recall 
https://issues.apache.org/jira/browse/HBASE-29310 / 
https://github.com/apache/hbase/pull/7150 introduced a logic of require an full 
or incremental backup after a bulkload, would it still be required? {code}
No. That was based on the old design and will be overridden by the new 
approach, so it can be ignored.

 

> Copy Bulkloaded Files in Continuous Backup
> ------------------------------------------
>
>                 Key: HBASE-29519
>                 URL: https://issues.apache.org/jira/browse/HBASE-29519
>             Project: HBase
>          Issue Type: Sub-task
>          Components: backup&amp;restore
>            Reporter: Vinayak Hegde
>            Assignee: Vinayak Hegde
>            Priority: Major
>              Labels: pull-request-available
>
> Enhance the continuous backup replication endpoint to detect bulkload 
> operations and copy their HFiles to the backup location (e.g., S3). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29519) Copy Bulkloaded Files in Continuous Backup

Reply via email to