[
https://issues.apache.org/jira/browse/HBASE-29519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014056#comment-18014056
]
Vinayak Hegde commented on HBASE-29519:
---------------------------------------
{code:java}
if hbase.replication.bulkload.enabled is disable, can one still use PITR /
restore? {code}
Initially, I thought we could allow it, but now I believe we {*}shouldn’t{*}.
Without this setting enabled, bulk-loaded files can’t be restored, and the user
would lose that data.
I think we should enforce this in the continuous backup setup. If
{{hbase.replication.bulkload.enabled}} is not enabled, the setup should fail
with a clear message. This will prevent users from unknowingly proceeding and
later being unable to restore bulk-loaded files.
To make this more user-friendly, we could introduce another configuration, for
example:
{{hbase.backup.continuous.bulkloadfiles.backup.enabled}}
Possible values:
* *true* – Copy bulk-loaded files.
** If {{hbase.replication.bulkload.enabled}} is true, proceed.
** Else, fail with an error stating that
{{hbase.replication.bulkload.enabled}} must be enabled to back up bulk-loaded
files.
* *false* – Skip bulk-loaded files backup.
* *not specified* – Fail the operation, requiring the user to make an explicit
choice.
Few Important things to consider
* {{hbase.replication.bulkload.enabled}} is {*}global{*}, so enabling it will
replicate bulk-loaded files for all tables and affect all replication peers
(e.g., peer1, peer2 will also receive them).
* If bulk-loaded file backup was not initially chosen, those files won’t be
available later. Enabling the configuration later won’t recover past data.
* If the user realizes the mistake, the only option is to take a full backup
after enabling the setting, so future bulk-loaded files are backed up.
Please share your thoughts on this, as it’s a critical decision. [~andor]
[~wchevreuil] [~sergey.soldatov] [~ankit] [~taklwu] .
{code:java}
also, this may be a different task, I recall
https://issues.apache.org/jira/browse/HBASE-29310 /
https://github.com/apache/hbase/pull/7150 introduced a logic of require an full
or incremental backup after a bulkload, would it still be required? {code}
No. That was based on the old design and will be overridden by the new
approach, so it can be ignored.
> Copy Bulkloaded Files in Continuous Backup
> ------------------------------------------
>
> Key: HBASE-29519
> URL: https://issues.apache.org/jira/browse/HBASE-29519
> Project: HBase
> Issue Type: Sub-task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
> Labels: pull-request-available
>
> Enhance the continuous backup replication endpoint to detect bulkload
> operations and copy their HFiles to the backup location (e.g., S3).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)