[ 
https://issues.apache.org/jira/browse/IMPALA-12156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779454#comment-17779454
 ] 

ASF subversion and git services commented on IMPALA-12156:
----------------------------------------------------------

Commit c9c5fb89b5679dcb8e41f61529a65b7648500741 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c9c5fb89b ]

IMPALA-12156: Support High Availability for Statestore

To support statestore HA, we allow two statestored instances in an
Active-Passive HA pair to be added to an Impala cluster. We add the
preemptive behavior for statestored. When HA is enabled, the preemptive
behavior allows the statestored with the higher priority to become
active and the paired statestored becomes standby. The active
statestored acts as the owner of Impala cluster and provides statestore
service for the cluster members.

To enable catalog HA for a cluster, two statestoreds in the HA pair and
all subscribers must be started with starting flag
"enable_statestored_ha" as true.

This patch makes following changes:
- Defined new service for Statestore HA.
- Statestored negotiates the role for HA with its peer statestore
  instance on startup.
- Create HA monitor thread:
  Active statestored sends heartbeat to standby statestored.
  Standby statestored monitors peer's connection states with their
  subscribers.
- Standby statestored sends heartbeat to subscribers with request
  for connection state between active statestore and subscribers.
  Standby statestored saves the connection state as failure detecter.
- When standby statestored lost connection with active statestore,
  it checks the connection states for active statestore, and takes over
  active role if majority of subscribers lost connections with active
  statestore.
- New active statestored sends RPC notification to all subscribers
  for new active statestored and active catalogd elected by the new
  active statestored.
- New active statestored starts sending heartbeat to its peer when it
  receives handshake from its peer.
- Active statestored enters recovery mode if it lost connections with
  its peer statestored and all subscribers. It keeps sending HA
  handshake to its peer until receiving response.
- All subscribers (impalad/catalogd/admissiond) register to two
  statestoreds.
- Subscribers report connection state for the requests from standby
  statestore.
- Subscribers switch to new active statestore when receiving RPC
  notifications from new active statestored.
- Only active statestored sends topic update messages to subscribers.
- Add a new option "enable_statestored_ha" in script
  bin/start-impala-cluster.py for starting Impala mini-cluster with
  statestored HA enabled.
- Add a new Thrift API in statestore service to disable network
  for statestored. It's only used for unit-test to simulate network
  failure. For safety, it's only working when the debug action is set
  in starting flags.

Testings:
 - Added end-to-end unit tests for statestored HA.
 - Passed core tests

Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87
Reviewed-on: http://gerrit.cloudera.org:8080/20372
Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Support Impala Statestore HA
> ----------------------------
>
>                 Key: IMPALA-12156
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12156
>             Project: IMPALA
>          Issue Type: New Feature
>    Affects Versions: Impala 4.1.2
>            Reporter: Zhi Tang
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> The Impala component known as the StateStore checks on the health of all 
> Impala daemons in a cluster, and continuously relays its findings to each of 
> those daemons.  If the Statestore is not running or becomes unreachable, the 
> cluster becomes less robust and metadata becomes less consistent as it 
> changes. Therefore, implementing high availability for the Statestore is 
> essential for the stability of the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to