[jira] [Commented] (HBASE-28987) Developing a Custom ReplicationEndpoint to Support External Storage Integration

Duo Zhang (Jira) Thu, 06 Mar 2025 05:47:11 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-28987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933000#comment-17933000
 ]


Duo Zhang commented on HBASE-28987:
-----------------------------------

Please see my comments above...

{quote}
Another way is to close the S3 file on demand, maybe every 5 minutes? Then we 
need to implement compaction, of course this could be implement outside the 
regionserver process but anyway, it will increase the complexity.
{quote}

I've mentioned this solution two months ago...
But the problem here is that you need to implement compaction too, otherwise 
there will be bunch of small files on S3.
How do you guys plan to deal with these small files?

And even with this solution, I do not think we need the dual offset management 
here.
Bascially there are two possible ways:
1. Add a callback when we want to persist the offset, to close the current file 
on S3. Then the problem is how to tune the persist interval and size, I think 
this is easy in the current code base.
2. Only read the open WAL files every 5 minutes, and once we reach EOF, we 
close the file on S3 and store the offset.

What do you guys think?

Thanks.

> Developing a Custom ReplicationEndpoint to Support External Storage 
> Integration
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-28987
>                 URL: https://issues.apache.org/jira/browse/HBASE-28987
>             Project: HBase
>          Issue Type: Task
>          Components: backup&amp;restore
>    Affects Versions: 2.6.0, 3.0.0-alpha-4
>            Reporter: Vinayak Hegde
>            Assignee: Vinayak Hegde
>            Priority: Major
>
> *Develop a Custom Replication Endpoint*
> Implement a custom replication endpoint to support the backup of WALs to 
> external storage systems, such as HDFS-compliant storages (including HDFS, 
> S3, ADLS, and GCS via respective Hadoop connectors).
> *Support for Bulk-loaded Files*
> Add functionality to back up bulk-loaded files in addition to regular WALs.
> *Ensure Process Durability*
> Ensure the backup process is durable, with no WALs being missed, even in the 
> event of issues in the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28987) Developing a Custom ReplicationEndpoint to Support External Storage Integration

Reply via email to