[
https://issues.apache.org/jira/browse/HBASE-29519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013698#comment-18013698
]
Vinayak Hegde commented on HBASE-29519:
---------------------------------------
I have a few discussion points.
To replicate bulkloaded files, we need the
{{hbase.replication.bulkload.enabled}} configuration enabled. If the user has
not enabled it, I’m currently logging a warning stating that it’s required for
bulkload file replication, is that sufficient, or should we handle it
differently?
----
Also, at present, we process WAL entries as they arrive, write them to the WAL
file, and immediately upload the bulkloaded files. Later, when the WAL file is
full, we close it, which ensures it’s fully persisted. However, if we upload
the bulkloaded files but fail to close the WAL file, we may reprocess those
entries (since the offset won’t move until the WAL is closed) and upload them
again. This would overwrite the files, which might not be a major issue.
An alternative would be to collect the bulkloaded files in memory and upload
them when closing the corresponding WAL file in S3, but that would require
maintaining an in-memory structure and could add complexity. What do you think?
> Copy Bulkloaded Files in Continuous Backup
> ------------------------------------------
>
> Key: HBASE-29519
> URL: https://issues.apache.org/jira/browse/HBASE-29519
> Project: HBase
> Issue Type: Sub-task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
> Labels: pull-request-available
>
> Enhance the continuous backup replication endpoint to detect bulkload
> operations and copy their HFiles to the backup location (e.g., S3).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)