[ 
https://issues.apache.org/jira/browse/HBASE-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151792#comment-16151792
 ] 

Anastasia Braginsky commented on HBASE-18748:
---------------------------------------------

As explained in the description, we would like to add a feature to the HBase 
replication methodology. The failover from primary cluster to secondary should 
have zero effect on the read latency. Currently there is a spike in the read 
latency upon failover due to cache on the secondary being cold. Simple 
redirection (duplication by user application) of reads to secondary prior to 
failover, resolves this issue. However, to make secondary to proceed all the 
reads is some waist of resources. Therefore, the suggestion is to redirect only 
"relevant" reads. In other words, the suggested solution is to selectively 
replay read requests at the backup - namely, those reads that caused cache-ins 
at the primary. 

We intend to use WAL replication as transport protocol (hopefully, as black 
box), and of course add custom replay callbacks. Meaning, to add a new "read 
type" of WAL entries, that are going to be rare, only upon cache-in. Those, 
read WAL entries, are going to be replicated on the secondary cluster. Of 
course, the cache blocks on primary and secondary may diverse, but this is a 
good heuristic.

What do you think about this suggestion? [~stack] and everybody, we would like 
to hear from you! May be this is anyhow already implemented and we are not 
aware?

> Cache pre-warming upon replication
> ----------------------------------
>
>                 Key: HBASE-18748
>                 URL: https://issues.apache.org/jira/browse/HBASE-18748
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Anastasia Braginsky
>
> HBase's cluster replication is very important and widely used feature. Let's 
> assume primary cluster is replicated to secondary (backup) cluster using the 
> WAL of the primary cluster to propagate the changes. Let's also assume the 
> secondary cluster is a target for failover when needed and should become 
> primary when needed.
> We suggest improving the way the HBase cluster failover works today. Namely, 
> upon failover, the backup RS's cache is cold. Warming it up to the right 
> working set takes many minutes. The suggested solution is to selectively 
> replay read requests at the backup - namely, those reads that caused 
> cache-ins at the primary. We intend to use WAL replication as transport 
> protocol (hopefully, as black box), and of course add custom replay 
> callbacks. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to