Ajay Jadhav commented on HBASE-18448:

[~ram_krish] and [~anoop.hbase]: Sorry I got sidetracked last week. I spent 
some time today looking through
the coprocessor EP and doing some background reading.

The advantage of using CP EP I see is that it can process the request across 
regions parallelly which looks promising.
Currently, the refreshHFiles API is granular at the table level. It figures out 
all the regions for the table and issues RPC call to RS.
This will update each region sequentially.

Now with CP EP, there are 2 ways to go about implementing:
1. Refresh the HFile in memory handle list for all regions irrespective of the 
This way users will be able to keep the replica consistent with all updates in 
primary without the need of individually issuing refresh
for each table but this adds a huge performance overhead in case of slower 
filesystems like S3.
2. In case we limit it to the table, then the EP will accept the table as input 
and if the current region belongs to the input table, then do the refreshHFiles 
else skip.

I'm more inclined towards the second approach, let me know your opinion.

> Added support for refreshing HFiles through API and shell
> ---------------------------------------------------------
>                 Key: HBASE-18448
>                 URL: https://issues.apache.org/jira/browse/HBASE-18448
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.3.1
>            Reporter: Ajay Jadhav
>            Assignee: Ajay Jadhav
>            Priority: Minor
>             Fix For: 1.4.0
>         Attachments: HBASE-18448.branch-1.001.patch, 
> HBASE-18448.branch-1.002.patch
> In the case where multiple HBase clusters are sharing a common rootDir, even 
> after flushing the data from
> one cluster doesn't mean that other clusters (replicas) will automatically 
> pick the new HFile. Through this patch,
> we are exposing the refresh HFiles API which when issued from a replica will 
> update the in-memory file handle list
> with the newly added file.
> This allows replicas to be consistent with the data written through the 
> primary cluster. 

This message was sent by Atlassian JIRA

Reply via email to