[jira] [Commented] (HBASE-29519) Copy Bulkloaded Files in Continuous Backup

Vinayak Hegde (Jira) Wed, 13 Aug 2025 07:36:53 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013698#comment-18013698
 ]


Vinayak Hegde commented on HBASE-29519:
---------------------------------------

I have a few discussion points.

To replicate bulkloaded files, we need the 
{{hbase.replication.bulkload.enabled}} configuration enabled. If the user has 
not enabled it, I’m currently logging a warning stating that it’s required for 
bulkload file replication, is that sufficient, or should we handle it 
differently?

 
----
 

Also, at present, we process WAL entries as they arrive, write them to the WAL 
file, and immediately upload the bulkloaded files. Later, when the WAL file is 
full, we close it, which ensures it’s fully persisted. However, if we upload 
the bulkloaded files but fail to close the WAL file, we may reprocess those 
entries (since the offset won’t move until the WAL is closed) and upload them 
again. This would overwrite the files, which might not be a major issue.

An alternative would be to collect the bulkloaded files in memory and upload 
them when closing the corresponding WAL file in S3, but that would require 
maintaining an in-memory structure and could add complexity. What do you think?

> Copy Bulkloaded Files in Continuous Backup
> ------------------------------------------
>
>                 Key: HBASE-29519
>                 URL: https://issues.apache.org/jira/browse/HBASE-29519
>             Project: HBase
>          Issue Type: Sub-task
>          Components: backup&amp;restore
>            Reporter: Vinayak Hegde
>            Assignee: Vinayak Hegde
>            Priority: Major
>              Labels: pull-request-available
>
> Enhance the continuous backup replication endpoint to detect bulkload 
> operations and copy their HFiles to the backup location (e.g., S3). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29519) Copy Bulkloaded Files in Continuous Backup

Reply via email to