This is an automated email from the ASF dual-hosted git repository. wzhou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit db6ead81363551e23d593ac435ec4bde81033ee8 Author: m-sanjana19 <[email protected]> AuthorDate: Thu Jul 25 15:43:18 2024 +0530 IMPALA-13142: [DOCS] Documentation for Impala StateStore & Catalogd HA Change-Id: I8927c9cd61f0274ad91111d6ac4a079f7a563197 Reviewed-on: http://gerrit.cloudera.org:8080/21615 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Yida Wu <[email protected]> Reviewed-by: Wenzhe Zhou <[email protected]> --- docs/impala.ditamap | 7 ++- docs/topics/impala_ha.xml | 50 ++++++++++++++++++ docs/topics/impala_ha_catalog.xml | 100 +++++++++++++++++++++++++++++++++++ docs/topics/impala_ha_statestore.xml | 97 +++++++++++++++++++++++++++++++++ 4 files changed, 252 insertions(+), 2 deletions(-) diff --git a/docs/impala.ditamap b/docs/impala.ditamap index 103c95e53..e896229f3 100644 --- a/docs/impala.ditamap +++ b/docs/impala.ditamap @@ -64,6 +64,10 @@ under the License. <topicref href="topics/impala_tutorial.xml"/> <topicref href="topics/impala_admin.xml"> <topicref href="topics/impala_timeouts.xml"/> + <topicref href="topics/impala_ha.xml"> + <topicref href="topics/impala_ha_statestore.xml"/> + <topicref href="topics/impala_ha_catalog.xml"/> + </topicref> <topicref href="topics/impala_proxy.xml"/> <topicref href="topics/impala_disk_space.xml"/> <topicref audience="integrated" href="topics/impala_auditing.xml"/> @@ -361,8 +365,7 @@ under the License. <!-- End of former contents of Installing-and-Using-Impala_xi42979.ditamap. --> <topicref audience="standalone" href="topics/impala_faq.xml"/> <topicref audience="standalone" href="topics/impala_release_notes.xml"> - <mapref href="impala_release_notes.ditamap" format="ditamap" - audience="standalone"/> + <mapref href="impala_release_notes.ditamap" format="ditamap" audience="standalone"/> </topicref> <!-- Substitution variables and link destinations abstracted diff --git a/docs/topics/impala_ha.xml b/docs/topics/impala_ha.xml new file mode 100644 index 000000000..6a737f887 --- /dev/null +++ b/docs/topics/impala_ha.xml @@ -0,0 +1,50 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="impala_ha"> + <title>Configuring Impala for High Availability</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Administrators"/> + <data name="Category" value="High Availability"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + <conbody> + <p>The Impala StateStore checks on the health of all Impala daemons in a cluster, and + continuously relays its findings to each of the daemons. The Catalog stores metadata of + databases, tables, partitions, resource usage information, configuration settings, and other + objects managed by Impala. If StateStore and Catalog daemons are single instances in an Impala + cluster, it will create a single point of failure. Although Impala coordinators/executors + continue to execute queries if the StateStore node is down, coordinators/executors will not + get state updates. This causes degradation of admission control & cluster membership + updates. To mitigate this, a pair of StateStore and Catalog instances can be deployed in an + Impala cluster so that Impala cluster could survive failures of StateStore or Catalog.</p> + <p><b>Prerequisite:</b></p> + <p>To enable High Availability for Impala CatalogD and StateStore, you must configure at least + two Impala CatalogD, two StateStore instances on two different nodes.<note + id="note_dg1_qhh_fcc" type="note">CatalogD HA and Statestore HA are independent features. + Users can enable CatalogD HA, Statestore HA, or both CatalogD HA and Statestore + HA.</note></p> + <p outputclass="toc"/> + </conbody> +</concept> diff --git a/docs/topics/impala_ha_catalog.xml b/docs/topics/impala_ha_catalog.xml new file mode 100644 index 000000000..72b3c765f --- /dev/null +++ b/docs/topics/impala_ha_catalog.xml @@ -0,0 +1,100 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="impala_ha_catalog"> + <title>Configuring Catalog for High Availability</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Administrators"/> + <data name="Category" value="High Availability"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Catalog"/> + </metadata> + </prolog> + <conbody> + <p>With any new query requests, the Impala coordinator sends metadata requests to Catalog + service and sends metadata updates to Catalog which in turn propagates metadata updates to + hive metastore. With a pair of primary/standby Catalog instances, the standby Catalog instance + will be promoted as the primary instance to continue Catalog service for Impala cluster when + the primary instance goes down. The active Catalogd acts as the source of metadata and + provides Catalog service for the Impala cluster. This high availability mode of Catalog + service reduces the outage duration of the Impala cluster when the primary Catalog instance + fails. To support Catalog HA, you can now add two Catalogd instances in an Active-Passive high + availability pair to an Impala cluster.</p> + </conbody> + <concept id="enabling_catalog_ha"> + <title>Enabling Catalog High Availability</title> + <conbody> + <p>To enable Catalog high availability in an Impala cluster, follow these steps:<ul + id="ul_pj4_ycn_1cc"> + <li>Set the starting flag <codeph>enable_catalogd_ha</codeph> to <codeph>true</codeph> for + both catalogd instances and the StateStore.</li> + </ul></p> + <p>The active StateStore will assign roles to the CatalogD instances, designating one as the + active CatalogD and the other as the standby CatalogD. The active CatalogD acts as the + source of metadata and provides Catalog services for the Impala cluster.</p> + </conbody> + </concept> + <concept id="disabling_catalog_ha"> + <title>Disabling Catalog High Availability</title> + <conbody> + <p>To disable Catalog high availability in an Impala cluster, follow these steps:<ol + id="ol_udj_b1n_1cc"> + <li>Remove one CatalogD instance from the Impala cluster.</li> + <li>Restart the remaining CatalogD instance without the starting flag + <codeph>enable_catalogd_ha</codeph>.</li> + <li>Restart the StateStore without the starting flag + <codeph>enable_catalogd_ha</codeph>.</li> + </ol></p> + </conbody> + </concept> + <concept id="monitoring_status_ha"> + <title>Monitoring Catalog HA Status in the StateStore Web Page</title> + <conbody> + <p>A new web page <codeph>/catalog_ha_info</codeph> has been added to the StateStore debug web + server. This page displays the Catalog HA status, including:</p> + <ul id="ha_status_web"> + <li>Active Catalog Node</li> + <li>Standby Catalog Node</li> + <li>Notified Subscribers Table</li> + </ul> + <p>To access this information, navigate to the <codeph>/catalog_ha_info</codeph> page on the + StateStore debug web server.</p> + </conbody> + </concept> + <concept id="catalog_failure_detection"> + <title>Catalog Failure Detection</title> + <conbody> + <p>The StateStore instance continuously sends heartbeat to its registered clients, including + the primary and standby Catalog instances, to track Impala daemons in the cluster to + determine if the daemon is healthy. If the StateStore finds the primary Catalog instance is + not healthy but the standby Catalog instance is healthy, StateStore promotes the standby + Catalog instance as primary instance and notify all coordinators for this change. + Coordinators will switch over to the new primary Catalog instance.</p> + <p>When the system detects that the active CatalogD is unhealthy, it initiates a failover to + the standby CatalogD. During this brief transition, some nodes might not immediately + recognize the new active CatalogD, causing currently running queries to fail due to lack of + access to metadata. These failed queries need to be rerun after the failover is complete and + the new active CatalogD is operational.</p> + </conbody> + </concept> +</concept> diff --git a/docs/topics/impala_ha_statestore.xml b/docs/topics/impala_ha_statestore.xml new file mode 100644 index 000000000..b0e95ec19 --- /dev/null +++ b/docs/topics/impala_ha_statestore.xml @@ -0,0 +1,97 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="impala_ha_statestore"> + <title>Configuring StateStore for High Availability</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Administrators"/> + <data name="Category" value="High Availability"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="StateStore"/> + </metadata> + </prolog> + <conbody> + <p>With a pair of StateStore instances in primary/standby mode, the primary StateStore instance + will send the cluster's state update and propagate metadata updates. It periodically sends + heartbeat to the standby StateStore instance, CatalogD, coordinators and executors. The standby + StateStore instance also sends heartbeats to the CatalogD, and coordinators and executors. RPC + connections between daemons and StateStore instances are kept alive so that broken connections + usually won't result in false failure reports between nodes. The standby StateStore instance + takes over the primary role when the service is needed in order to continue to operate when + the primary instance goes down.</p> + </conbody> + <concept id="enabling_statestore_ha"> + <title>Enabling StateStore High Availability</title> + <conbody> + <p>To enable StateStore High Availability (HA) in an Impala cluster, follow these steps:<ol + id="ol_k2p_zxm_1cc"> + <li>Restart two StateStore instances with the following additional + flags:<codeblock id="codeblock_tmp_bym_1cc">enable_statestored_ha: true +state_store_ha_port: RPC port for StateStore HA service (default: 24020) +state_store_peer_host: Hostname of the peer StateStore instance +state_store_peer_ha_port: RPC port of high availability service on the peer StateStore instance (default: 24020) +</codeblock></li> + <li>Restart all subscribers (including CatalogD, coordinators, and executors) with the + following additional + flags:<codeblock id="codeblock_zyl_lmh_fcc">state_store_host: Hostname of the first StateStore instance +state_store_port: RPC port for StateStore registration on the first StateStore instance (default: 24000) +enable_statestored_ha: true +state_store_2_host: Hostname of the second StateStore instance +state_store_2_port: RPC port for StateStore registration on the second StateStore instance (default: 24000)</codeblock></li> + </ol></p> + <p>By setting these flags, the Impala cluster is configured to use two StateStore instances for + high availability, ensuring high availability and fault tolerance.</p> + </conbody> + </concept> + <concept id="disabling_statestore_ha"> + <title>Disabling StateStore High Availability</title> + <conbody> + <p>To disable StateStore high availability in an Impala cluster, follow these steps:<ol + id="ol_udj_b1n_1cc"> + <li>Remove one StateStore instance from the Impala cluster.</li> + <li>Restart the remaining StateStore instance along with the coordinator, executor, and + CatalogD nodes, ensuring they are restarted without the + <codeph>enable_statestored_ha</codeph> flag.</li> + </ol></p> + </conbody> + </concept> + <concept id="statestore_failure_detection"> + <title>StateStore Failure Detection</title> + <conbody> + <p>The primary StateStore instance continuously sends heartbeat to its registered clients, and + the standby StateStore instance. Each StateStore client registers to both active and standby + StateStore instances, and maintains the following information about the StateStore servers: + the server IP and port, service role - primary/standby, the last time the heartbeat request + was received, or number of missed heartbeats. A missing heartbeat response from the + StateStor’s client indicates an unhealthy daemon. There is a flag that defines + <codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM </codeph>as the consecutive number of missed + heartbeat requests to indicate losing communication with the StateStore server from the + client's point of view so that the client marks the StateStore server as down. Standby + StateStore instance collects the connection states between the clients (CatalogD, + coordinators and executors) and primary StateStore instance in its heartbeat messages to the + clients. If the standby StateStore instance misses <codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM + </codeph>of heartbeat requests from the primary StateStore instance and the majority of + clients lose connections with the primary StateStore, it takes over the primary role.</p> + </conbody> + </concept> +</concept>
