This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit db6ead81363551e23d593ac435ec4bde81033ee8
Author: m-sanjana19 <[email protected]>
AuthorDate: Thu Jul 25 15:43:18 2024 +0530

    IMPALA-13142: [DOCS] Documentation for Impala StateStore & Catalogd HA
    
    Change-Id: I8927c9cd61f0274ad91111d6ac4a079f7a563197
    Reviewed-on: http://gerrit.cloudera.org:8080/21615
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Yida Wu <[email protected]>
    Reviewed-by: Wenzhe Zhou <[email protected]>
---
 docs/impala.ditamap                  |   7 ++-
 docs/topics/impala_ha.xml            |  50 ++++++++++++++++++
 docs/topics/impala_ha_catalog.xml    | 100 +++++++++++++++++++++++++++++++++++
 docs/topics/impala_ha_statestore.xml |  97 +++++++++++++++++++++++++++++++++
 4 files changed, 252 insertions(+), 2 deletions(-)

diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 103c95e53..e896229f3 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -64,6 +64,10 @@ under the License.
   <topicref href="topics/impala_tutorial.xml"/>
   <topicref href="topics/impala_admin.xml">
     <topicref href="topics/impala_timeouts.xml"/>
+    <topicref href="topics/impala_ha.xml">
+      <topicref href="topics/impala_ha_statestore.xml"/>
+      <topicref href="topics/impala_ha_catalog.xml"/>
+    </topicref>
     <topicref href="topics/impala_proxy.xml"/>
     <topicref href="topics/impala_disk_space.xml"/>
     <topicref audience="integrated" href="topics/impala_auditing.xml"/>
@@ -361,8 +365,7 @@ under the License.
 <!-- End of former contents of Installing-and-Using-Impala_xi42979.ditamap. -->
   <topicref audience="standalone" href="topics/impala_faq.xml"/>
   <topicref audience="standalone" href="topics/impala_release_notes.xml">
-    <mapref href="impala_release_notes.ditamap" format="ditamap"
-      audience="standalone"/>
+    <mapref href="impala_release_notes.ditamap" format="ditamap" 
audience="standalone"/>
   </topicref>
 
 <!-- Substitution variables and link destinations abstracted
diff --git a/docs/topics/impala_ha.xml b/docs/topics/impala_ha.xml
new file mode 100644
index 000000000..6a737f887
--- /dev/null
+++ b/docs/topics/impala_ha.xml
@@ -0,0 +1,50 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="impala_ha">
+  <title>Configuring Impala for High Availability</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="High Availability"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+  <conbody>
+    <p>The Impala StateStore checks on the health of all Impala daemons in a 
cluster, and
+      continuously relays its findings to each of the daemons. The Catalog 
stores metadata of
+      databases, tables, partitions, resource usage information, configuration 
settings, and other
+      objects managed by Impala. If StateStore and Catalog daemons are single 
instances in an Impala
+      cluster, it will create a single point of failure. Although Impala 
coordinators/executors
+      continue to execute queries if the StateStore node is down, 
coordinators/executors will not
+      get state updates. This causes degradation of admission control &amp; 
cluster membership
+      updates. To mitigate this, a pair of StateStore and Catalog instances 
can be deployed in an
+      Impala cluster so that Impala cluster could survive failures of 
StateStore or Catalog.</p>
+    <p><b>Prerequisite:</b></p>
+    <p>To enable High Availability for Impala CatalogD and StateStore, you 
must configure at least
+      two Impala CatalogD, two StateStore instances on two different 
nodes.<note
+        id="note_dg1_qhh_fcc" type="note">CatalogD HA and Statestore HA are 
independent features.
+        Users can enable CatalogD HA, Statestore HA, or both CatalogD HA and 
Statestore
+      HA.</note></p>
+    <p outputclass="toc"/>
+  </conbody>
+</concept>
diff --git a/docs/topics/impala_ha_catalog.xml 
b/docs/topics/impala_ha_catalog.xml
new file mode 100644
index 000000000..72b3c765f
--- /dev/null
+++ b/docs/topics/impala_ha_catalog.xml
@@ -0,0 +1,100 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="impala_ha_catalog">
+  <title>Configuring Catalog for High Availability</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="High Availability"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Catalog"/>
+    </metadata>
+  </prolog>
+  <conbody>
+    <p>With any new query requests, the Impala coordinator sends metadata 
requests to Catalog
+      service and sends metadata updates to Catalog which in turn propagates 
metadata updates to
+      hive metastore. With a pair of primary/standby Catalog instances, the 
standby Catalog instance
+      will be promoted as the primary instance to continue Catalog service for 
Impala cluster when
+      the primary instance goes down. The active Catalogd acts as the source 
of metadata and
+      provides Catalog service for the Impala cluster. This high availability 
mode of Catalog
+      service reduces the outage duration of the Impala cluster when the 
primary Catalog instance
+      fails. To support Catalog HA, you can now add two Catalogd instances in 
an Active-Passive high
+      availability pair to an Impala cluster.</p>
+  </conbody>
+  <concept id="enabling_catalog_ha">
+    <title>Enabling Catalog High Availability</title>
+    <conbody>
+      <p>To enable Catalog high availability in an Impala cluster, follow 
these steps:<ul
+          id="ul_pj4_ycn_1cc">
+          <li>Set the starting flag <codeph>enable_catalogd_ha</codeph> to 
<codeph>true</codeph> for
+            both catalogd instances and the StateStore.</li>
+        </ul></p>
+      <p>The active StateStore will assign roles to the CatalogD instances, 
designating one as the
+        active CatalogD and the other as the standby CatalogD. The active 
CatalogD acts as the
+        source of metadata and provides Catalog services for the Impala 
cluster.</p>
+    </conbody>
+  </concept>
+  <concept id="disabling_catalog_ha">
+    <title>Disabling Catalog High Availability</title>
+    <conbody>
+      <p>To disable Catalog high availability in an Impala cluster, follow 
these steps:<ol
+          id="ol_udj_b1n_1cc">
+          <li>Remove one CatalogD instance from the Impala cluster.</li>
+          <li>Restart the remaining CatalogD instance without the starting flag
+              <codeph>enable_catalogd_ha</codeph>.</li>
+          <li>Restart the StateStore without the starting flag
+            <codeph>enable_catalogd_ha</codeph>.</li>
+        </ol></p>
+    </conbody>
+  </concept>
+  <concept id="monitoring_status_ha">
+    <title>Monitoring Catalog HA Status in the StateStore Web Page</title>
+    <conbody>
+      <p>A new web page <codeph>/catalog_ha_info</codeph> has been added to 
the StateStore debug web
+        server. This page displays the Catalog HA status, including:</p>
+      <ul id="ha_status_web">
+        <li>Active Catalog Node</li>
+        <li>Standby Catalog Node</li>
+        <li>Notified Subscribers Table</li>
+      </ul>
+      <p>To access this information, navigate to the 
<codeph>/catalog_ha_info</codeph> page on the
+        StateStore debug web server.</p>
+    </conbody>
+  </concept>
+  <concept id="catalog_failure_detection">
+    <title>Catalog Failure Detection</title>
+    <conbody>
+      <p>The StateStore instance continuously sends heartbeat to its 
registered clients, including
+        the primary and standby Catalog instances, to track Impala daemons in 
the cluster to
+        determine if the daemon is healthy. If the StateStore finds the 
primary Catalog instance is
+        not healthy but the standby Catalog instance is healthy, StateStore 
promotes the standby
+        Catalog instance as primary instance and notify all coordinators for 
this change.
+        Coordinators will switch over to the new primary Catalog instance.</p>
+      <p>When the system detects that the active CatalogD is unhealthy, it 
initiates a failover to
+        the standby CatalogD. During this brief transition, some nodes might 
not immediately
+        recognize the new active CatalogD, causing currently running queries 
to fail due to lack of
+        access to metadata. These failed queries need to be rerun after the 
failover is complete and
+        the new active CatalogD is operational.</p>
+    </conbody>
+  </concept>
+</concept>
diff --git a/docs/topics/impala_ha_statestore.xml 
b/docs/topics/impala_ha_statestore.xml
new file mode 100644
index 000000000..b0e95ec19
--- /dev/null
+++ b/docs/topics/impala_ha_statestore.xml
@@ -0,0 +1,97 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="impala_ha_statestore">
+  <title>Configuring StateStore for High Availability</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="High Availability"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="StateStore"/>
+    </metadata>
+  </prolog>
+  <conbody>
+    <p>With a pair of StateStore instances in primary/standby mode, the 
primary StateStore instance
+      will send the cluster's state update and propagate metadata updates. It 
periodically sends
+      heartbeat to the standby StateStore instance, CatalogD, coordinators and 
executors. The standby
+      StateStore instance also sends heartbeats to the CatalogD, and 
coordinators and executors. RPC
+      connections between daemons and StateStore instances are kept alive so 
that broken connections
+      usually won't result in false failure reports between nodes. The standby 
StateStore instance
+      takes over the primary role when the service is needed in order to 
continue to operate when
+      the primary instance goes down.</p>
+  </conbody>
+  <concept id="enabling_statestore_ha">
+    <title>Enabling StateStore High Availability</title>
+    <conbody>
+      <p>To enable StateStore High Availability (HA) in an Impala cluster, 
follow these steps:<ol
+        id="ol_k2p_zxm_1cc">
+        <li>Restart two StateStore instances with the following additional
+          flags:<codeblock id="codeblock_tmp_bym_1cc">enable_statestored_ha: 
true
+state_store_ha_port: RPC port for StateStore HA service (default: 24020)
+state_store_peer_host: Hostname of the peer StateStore instance
+state_store_peer_ha_port: RPC port of high availability service on the peer 
StateStore instance (default: 24020)
+</codeblock></li>
+        <li>Restart all subscribers (including CatalogD, coordinators, and 
executors) with the
+            following additional
+            flags:<codeblock id="codeblock_zyl_lmh_fcc">state_store_host: 
Hostname of the first StateStore instance
+state_store_port: RPC port for StateStore registration on the first StateStore 
instance (default: 24000)
+enable_statestored_ha: true
+state_store_2_host: Hostname of the second StateStore instance
+state_store_2_port: RPC port for StateStore registration on the second 
StateStore instance (default: 24000)</codeblock></li>
+      </ol></p>
+    <p>By setting these flags, the Impala cluster is configured to use two 
StateStore instances for
+      high availability, ensuring high availability and fault tolerance.</p>
+    </conbody>
+  </concept>
+  <concept id="disabling_statestore_ha">
+    <title>Disabling StateStore High Availability</title>
+    <conbody>
+      <p>To disable StateStore high availability in an Impala cluster, follow 
these steps:<ol
+          id="ol_udj_b1n_1cc">
+          <li>Remove one StateStore instance from the Impala cluster.</li>
+          <li>Restart the remaining StateStore instance along with the 
coordinator, executor, and
+            CatalogD nodes, ensuring they are restarted without the
+              <codeph>enable_statestored_ha</codeph> flag.</li>
+        </ol></p>
+    </conbody>
+  </concept>
+  <concept id="statestore_failure_detection">
+    <title>StateStore Failure Detection</title>
+    <conbody>
+      <p>The primary StateStore instance continuously sends heartbeat to its 
registered clients, and
+        the standby StateStore instance. Each StateStore client registers to 
both active and standby
+        StateStore instances, and maintains the following information about 
the StateStore servers:
+        the server IP and port, service role - primary/standby, the last time 
the heartbeat request
+        was received, or number of missed heartbeats. A missing heartbeat 
response from the
+        StateStor’s client indicates an unhealthy daemon. There is a flag that 
defines
+          <codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM </codeph>as the consecutive 
number of missed
+        heartbeat requests to indicate losing communication with the 
StateStore server from the
+        client's point of view so that the client marks the StateStore server 
as down. Standby
+        StateStore instance collects the connection states between the clients 
(CatalogD,
+        coordinators and executors) and primary StateStore instance in its 
heartbeat messages to the
+        clients. If the standby StateStore instance misses 
<codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM
+        </codeph>of heartbeat requests from the primary StateStore instance 
and the majority of
+        clients lose connections with the primary StateStore, it takes over 
the primary role.</p>
+    </conbody>
+  </concept>
+</concept>

Reply via email to