[ 
https://issues.apache.org/jira/browse/HBASE-18448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102052#comment-16102052
 ] 

Ajay Jadhav commented on HBASE-18448:
-------------------------------------

[~ram_krish]: Exposing the refresh hfiles API is useful in the following 
scenario:
Assuming we have 2 HBase clusters pointing to same rootDir (S3 bucket) out of 
which one is in read-only mode (replica) and the other one accepts writes 
(primary)

1. We issue a "put" on primary cluster and do a flush immediately.
2. This will create an HFile on storage (S3).
3. Replica will not be aware of this newly created HFile as the write didn't go 
through it.
4. The only way for replica to be consistent with primary is to issue a refresh 
HFiles on replica which will
    update the in-memory file handle list for replica.

This is why we need the refresh HFiles API to keep all the clusters consistent 
with writes on the primary cluster.

More information about this feature is available here too- 
https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/

> Added support for refreshing HFiles through API and shell
> ---------------------------------------------------------
>
>                 Key: HBASE-18448
>                 URL: https://issues.apache.org/jira/browse/HBASE-18448
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 1.3.1
>            Reporter: Ajay Jadhav
>            Assignee: Ajay Jadhav
>            Priority: Minor
>             Fix For: 1.4.0
>
>         Attachments: HBASE-18448.branch-1.001.patch, 
> HBASE-18448.branch-1.002.patch
>
>
> In the case where multiple HBase clusters are sharing a common rootDir, even 
> after flushing the data from
> one cluster doesn't mean that other clusters (replicas) will automatically 
> pick the new HFile. Through this patch,
> we are exposing the refresh HFiles API which when issued from a replica will 
> update the in-memory file handle list
> with the newly added file.
> This allows replicas to be consistent with the data written through the 
> primary cluster. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to