[
https://issues.apache.org/jira/browse/HBASE-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151792#comment-16151792
]
Anastasia Braginsky commented on HBASE-18748:
---------------------------------------------
As explained in the description, we would like to add a feature to the HBase
replication methodology. The failover from primary cluster to secondary should
have zero effect on the read latency. Currently there is a spike in the read
latency upon failover due to cache on the secondary being cold. Simple
redirection (duplication by user application) of reads to secondary prior to
failover, resolves this issue. However, to make secondary to proceed all the
reads is some waist of resources. Therefore, the suggestion is to redirect only
"relevant" reads. In other words, the suggested solution is to selectively
replay read requests at the backup - namely, those reads that caused cache-ins
at the primary.
We intend to use WAL replication as transport protocol (hopefully, as black
box), and of course add custom replay callbacks. Meaning, to add a new "read
type" of WAL entries, that are going to be rare, only upon cache-in. Those,
read WAL entries, are going to be replicated on the secondary cluster. Of
course, the cache blocks on primary and secondary may diverse, but this is a
good heuristic.
What do you think about this suggestion? [~stack] and everybody, we would like
to hear from you! May be this is anyhow already implemented and we are not
aware?
> Cache pre-warming upon replication
> ----------------------------------
>
> Key: HBASE-18748
> URL: https://issues.apache.org/jira/browse/HBASE-18748
> Project: HBase
> Issue Type: New Feature
> Reporter: Anastasia Braginsky
>
> HBase's cluster replication is very important and widely used feature. Let's
> assume primary cluster is replicated to secondary (backup) cluster using the
> WAL of the primary cluster to propagate the changes. Let's also assume the
> secondary cluster is a target for failover when needed and should become
> primary when needed.
> We suggest improving the way the HBase cluster failover works today. Namely,
> upon failover, the backup RS's cache is cold. Warming it up to the right
> working set takes many minutes. The suggested solution is to selectively
> replay read requests at the backup - namely, those reads that caused
> cache-ins at the primary. We intend to use WAL replication as transport
> protocol (hopefully, as black box), and of course add custom replay
> callbacks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)