[ https://issues.apache.org/jira/browse/PHOENIX-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ritesh updated PHOENIX-7555: ---------------------------- Description: Follow up from PHOENIX-7493 Add metrics for HAGroupStoreManager and HAGroupStoreClient was: Phoenix HA (PHOENIX-6491) suggests a best effort failover process for the failover HA policy. The first step is to make both clusters’ roles Standby, and then wait for replication to finish (best-effort). The final step is to make the other cluster role Active. When the cluster role is set to Standby, the dual cluster Phoenix client does not allow read/write operations on a standby cluster. This helps drain replication data from the previously Active cluster to the previously Standby cluster. However, in practice a cluster may receive changes without using the Phoenix dual client. For example, data can be inserted through MapReduce jobs which do not use the Phoenix JDBC client. Another example is that the previously active cluster could be receiving replication data from a third cluster. This means pausing writes at the Phoenix client is not sufficient for a graceful failover operation. Here graceful means consistent failover between two healthy clusters. A consistent failover can be achieved only when the replication data is completely sent to the soon to-be Active cluster. To ensure that all incoming data is paused before the failover event, we need to stop writing to the cluster at the server side. To achieve this, a Phoenix coprocessor can also maintain and watch cluster role changes and stop writes when an Active cluster becomes Standby as the dual Phoenix client does. In order to eliminate the ambiguity on which cluster was previously Active, a new HA role called ActiveToStandby is introduced. Both Phoenix client and server do not allow write operations on an ActiveToStandby cluster. With the above changes, graceful failover is achieved by the following steps # Change the Active cluster’s role to ActiveToStandby, # Wait for the replication data is drained # Change the Standby cluster’s role to Active, and the ActiveToStandby cluster’s role Standby > Graceful Failover with Phoenix HA - Metrics > ------------------------------------------- > > Key: PHOENIX-7555 > URL: https://issues.apache.org/jira/browse/PHOENIX-7555 > Project: Phoenix > Issue Type: Improvement > Reporter: Ritesh > Assignee: Ritesh > Priority: Major > > Follow up from PHOENIX-7493 > Add metrics for HAGroupStoreManager and HAGroupStoreClient -- This message was sent by Atlassian Jira (v8.20.10#820010)