impala git commit: [DOCS] Dedicated coordinators and executors explained

arodoni Mon, 06 Aug 2018 09:19:07 -0700

Repository: impala
Updated Branches:
  refs/heads/master 34fd9732e -> 312a8d60d



[DOCS] Dedicated coordinators and executors explained

Change-Id: Ic0038eb40951ddf619ea6470a7363122635391d8
Reviewed-on: http://gerrit.cloudera.org:8080/11080
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/312a8d60
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/312a8d60
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/312a8d60

Branch: refs/heads/master
Commit: 312a8d60dd0f8f9efa4072a3f12579640857abc5
Parents: 34fd973
Author: Alex Rodoni <arod...@cloudera.com>
Authored: Mon Jul 30 11:49:33 2018 -0700
Committer: Alex Rodoni <arod...@cloudera.com>
Committed: Mon Aug 6 15:59:48 2018 +0000

----------------------------------------------------------------------
 docs/impala.ditamap                          |   1 +
 docs/impala_keydefs.ditamap                  |   2 +-
 docs/topics/impala_dedicated_coordinator.xml | 629 ++++++++++++++++++++++
 docs/topics/impala_scalability.xml           | 114 ----
 4 files changed, 631 insertions(+), 115 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 19c72a6..68558fd 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -283,6 +283,7 @@ under the License.
     <topicref audience="hidden" href="topics/impala_perf_ddl.xml"/>
   </topicref>
   <topicref href="topics/impala_scalability.xml"/>
+    <topicref href="topics/impala_dedicated_coordinator.xml"/>
   <topicref href="topics/impala_partitioning.xml"/>
   <topicref href="topics/impala_file_formats.xml">
     <topicref href="topics/impala_txtfile.xml"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index ac97531..ff83cd2 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10957,7 +10957,7 @@ under the License.
   <keydef href="topics/impala_scalability.xml" keys="scalability"/>
   <keydef audience="hidden" 
href="topics/impala_scalability.xml#scalability_memory" 
keys="scalability_memory"/>
   <keydef href="topics/impala_scalability.xml#scalability_catalog" 
keys="scalability_catalog"/>
-  <keydef href="topics/impala_scalability.xml#scalability_coordinator" 
keys="scalability_coordinator"/>
+  <keydef href="topics/impala_dedicated_coordinator.xml" 
keys="scalability_coordinator"/>
   <keydef href="topics/impala_scalability.xml#statestore_scalability" 
keys="statestore_scalability"/>
   <keydef href="topics/impala_scalability.xml#scalability_buffer_pool" 
keys="scalability_buffer_pool"/>
   <keydef audience="hidden" 
href="topics/impala_scalability.xml#scalability_cluster_size" 
keys="scalability_cluster_size"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/topics/impala_dedicated_coordinator.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_dedicated_coordinator.xml 
b/docs/topics/impala_dedicated_coordinator.xml
new file mode 100644
index 0000000..1b43772
--- /dev/null
+++ b/docs/topics/impala_dedicated_coordinator.xml
@@ -0,0 +1,629 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<concept id="scalability">
+
+  <title>How to Configure Impala with Dedicated Coordinators</title>
+
+  <titlealts audience="PDF">
+
+    <navtitle>Dedicated Coordinators Optimization</navtitle>
+
+  </titlealts>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Planning"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Memory"/>
+      <data name="Category" value="Scalability"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Each host that runs the Impala Daemon acts as both a coordinator and as 
an executor, by
+      default, managing metadata caching, query compilation, and query 
execution. In this
+      configuration, Impala clients can connect to any Impala daemon and send 
query requests.
+    </p>
+
+    <p>
+      During highly concurrent workloads for large-scale queries, the dual 
roles can cause
+      scalability issues because:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          The extra work required for a host to act as the coordinator could 
interfere with its
+          capacity to perform other work for the later phases of the query. 
For example,
+          coordinators can experience significant network and CPU overhead 
with queries
+          containing a large number of query fragments. Each coordinator 
caches metadata for all
+          table partitions and data files, which requires coordinators to be 
configured with a
+          large JVM heap. Executor-only Impala daemons should be configured 
with the default JVM
+          heaps, which leaves more memory available to process joins, 
aggregations, and other
+          operations performed by query executors.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          Having a large number of hosts act as coordinators can cause 
unnecessary network
+          overhead, or even timeout errors, as each of those hosts 
communicates with the
+          Statestored daemon for metadata updates.
+        </p>
+      </li>
+
+      <li>
+        <p >
+          The "soft limits" imposed by the admission control feature are more 
likely to be
+          exceeded when there are a large number of heavily loaded hosts 
acting as coordinators.
+          Check
+          <xref
+            href="https://issues.apache.org/jira/browse/IMPALA-3649";
+            format="html" scope="external">
+          <u>IMPALA-3649</u>
+          </xref> and
+          <xref
+            href="https://issues.apache.org/jira/browse/IMPALA-6437";
+            format="html" scope="external">
+          <u>IMPALA-6437</u>
+          </xref> to see the status of the enhancements to mitigate this issue.
+        </p>
+      </li>
+    </ul>
+
+    <p >
+      The following factors can further exacerbate the above issues:
+    </p>
+
+    <ul>
+      <li >
+        <p >
+          High number of concurrent query fragments due to query concurrency 
and/or query
+          complexity
+        </p>
+      </li>
+
+      <li >
+        <p >
+          Large metadata topic size related to the number of 
partitions/files/blocks
+        </p>
+      </li>
+
+      <li >
+        <p >
+          High number of coordinator nodes
+        </p>
+      </li>
+
+      <li >
+        <p >
+          High number of coordinators used in the same resource pool
+        </p>
+      </li>
+    </ul>
+
+    <p>
+      If such scalability bottlenecks occur, in CDH 5.12 / Impala 2.9 and 
higher, you can assign
+      one dedicated role to each Impala daemon host, either as a coordinator 
or as an executor,
+      to address the issues.
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          All explicit or load-balanced client connections must go to the 
coordinator hosts.
+          These hosts perform the network communication to keep metadata 
up-to-date and route
+          query results to the appropriate clients. The dedicated coordinator 
hosts do not
+          participate in I/O-intensive operations such as scans, and 
CPU-intensive operations
+          such as aggregations.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          The executor hosts perform the intensive I/O, CPU, and memory 
operations that make up
+          the bulk of the work for each query. The executors do communicate 
with the Statestored
+          daemon for membership status, but the dedicated executor hosts do 
not process the
+          final result sets for queries.
+        </p>
+      </li>
+    </ul>
+
+    <p >
+      Using dedicated coordinators offers the following benefits:
+    </p>
+
+    <ul>
+      <li >
+        <p>
+          Reduces memory usage by limiting the number of Impala nodes that 
need to cache
+          metadata.
+        </p>
+      </li>
+
+      <li >
+        <p>
+          Provides a better concurrency by avoiding coordinator bottleneck.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          Eliminates the query over admission by using one dedicated 
coordinator.
+        </p>
+      </li>
+
+      <li >
+        <p>
+          Reduces resource, especially network, utilization on the Statestored 
daemon by
+          limiting metadata broadcast to a subset of nodes.
+        </p>
+      </li>
+
+      <li >
+        <p>
+          Improves reliability and performance for highly concurrent workloads 
by reducing
+          workload stress on coordinators. Dedicated coordinators require 50% 
or less
+          connections and threads.
+        </p>
+      </li>
+
+      <li >
+        <p>
+          Reduces the number of explicit metadata refreshes required.
+        </p>
+      </li>
+
+      <li >
+        <p>
+          Improves diagnosability if a bottleneck or other performance issue 
arises on a
+          specific host, you can narrow down the cause more easily because 
each host is
+          dedicated to specific operations within the overall Impala workload.
+        </p>
+      </li>
+    </ul>
+
+    <p>
+      In this configuration with dedicated coordinators / executors, you 
cannot connect to the
+      dedicated executor hosts through clients such as impala-shell or 
business intelligence
+      tools as only the coordinator nodes support client connections.
+    </p>
+
+  </conbody>
+
+  <concept id="concept_vhv_4b1_n2b">
+
+    <title>Determining the Optimal Number of Dedicated Coordinators</title>
+
+    <conbody>
+
+      <p >
+        You should have the smallest number of coordinators that will still 
satisfy your
+        workload requirements in a cluster. A rough estimation is 1 
coordinator for every 50
+        executors.
+      </p>
+
+      <p>
+        To maintain a healthy state and optimal performance, it is recommended 
that you keep the
+        peak utilization of all resources used by Impala, including CPU, the 
number of threads,
+        the number of connections, RPCs, under 80%.
+      </p>
+
+      <p >
+        Consider the following factors to determine the right number of 
coordinators in your
+        cluster:
+      </p>
+
+      <ul>
+        <li >
+          <p >
+            What is the number of concurrent queries?
+          </p>
+        </li>
+
+        <li >
+          <p >
+            What percentage of the workload is DDL?
+          </p>
+        </li>
+
+        <li >
+          <p>
+            What is the average query resource usage at the various stages 
(merge, runtime
+            filter, result set size, etc.)?
+          </p>
+        </li>
+
+        <li >
+          <p>
+            How many Impala Daemons (impalad) is in the cluster?
+          </p>
+        </li>
+
+        <li >
+          <p>
+            Is there a high availability requirement?
+          </p>
+        </li>
+
+        <li >
+          <p>
+            Compute/storage capacity reduction factor
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        Start with the below set of steps to determine the initial number of 
coordinators:
+      </p>
+
+      <ol>
+        <li>
+          If your cluster has less than 10 nodes, we recommend that you 
configure one dedicated
+          coordinator. Deploy the dedicated coordinator on a DataNode to avoid 
losing storage
+          capacity. In most of cases, one dedicated coordinator is enough to 
support all
+          workloads on a cluster.
+        </li>
+
+        <li>
+          Add more coordinators if the dedicated coordinator CPU or network 
peak utilization is
+          80% or higher. You might need 1 coordinator for every 50 executors.
+        </li>
+
+        <li>
+          If the Impala service is shared by multiple workgroups with a 
dynamic resource pool
+          assigned, use one coordinator per pool to avoid admission control 
over admission.
+        </li>
+
+        <li>
+          If high availability is required, double the number of coordinators. 
One set as an
+          active set and the other as a backup set.
+        </li>
+      </ol>
+
+    </conbody>
+
+    <concept id="concept_y4k_rc1_n2b">
+
+      <title>Advanced Tuning</title>
+
+      <conbody>
+
+        <p >
+          Use the following guidelines to further tune the throughput and 
stability.
+          <ol>
+            <li >
+              The concurrency of DML statements does not typically depend on 
the number of
+              coordinators or size of the cluster. Queries that return large 
result sets
+              (10,000+ rows) consume more CPU and memory resources on the 
coordinator. Add one
+              or two coordinators if the workload has many such queries.
+            </li>
+
+            <li>
+              DDL queries, excluding <codeph>COMPUTE STATS</codeph> and 
<codeph>CREATE TABLE AS
+              SELECT</codeph>, are executed only on coordinators. If your 
workload contains many
+              DDL queries running concurrently, you could add one coordinator.
+            </li>
+
+            <li>
+              The CPU contention on coordinators can slow down query 
executions when concurrency
+              is high, especially for very short queries (&lt;10s). Add more 
coordinators to
+              avoid CPU contention.
+            </li>
+
+            <li>
+              On a large cluster with 50+ nodes, the number of network 
connections from a
+              coordinator to executors can grow quickly as query complexity 
increases. The
+              growth is much greater on coordinators than executors. Add a few 
more coordinators
+              if workload are complex, i.e. (an average number of fragments * 
number of Impalad)
+              > 500, but with the low memory/CPU usage to share the load. 
Watch IMPALA-4603 and
+              IMPALA-7213 to track the progress on fixing this issue.
+            </li>
+
+            <li >
+              When using multiple coordinators for DML statements, divide 
queries to different
+              groups (number of groups = number of coordinators). Configure a 
separate dynamic
+              resource pool for each group and direct each group of query 
requests to a specific
+              coordinator. This is to avoid query over admission.
+            </li>
+
+            <li>
+              The front-end connection requirement is not a factor in 
determining the number of
+              dedicated coordinators. Consider setting up a connection pool at 
the client side
+              instead of adding coordinators. For a short term solution, you 
could increase the
+              value of <codeph>fe_service_threads</codeph> on coordinators to 
allow more client
+              connections.
+            </li>
+
+            <li>
+              In general, you should have a very small number of coordinators 
so storage
+              capacity reduction is not a concern. On a very small cluster 
(less than 10 nodes),
+              deploy a dedicated coordinator on a DataNode to avoid storage 
capacity reduction.
+            </li>
+          </ol>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="concept_w51_cd1_n2b">
+
+      <title>Estimating Coordinator Resource Usage</title>
+
+      <conbody>
+
+        <p>
+          <table id="table_kh3_d31_n2b" colsep="1" rowsep="1" frame="all">
+            <tgroup cols="3">
+              <colspec colnum="1" colname="col1" colsep="1" rowsep="1"/>
+              <colspec colnum="2" colname="col2" colsep="1" rowsep="1"/>
+              <colspec colnum="3" colname="col3" colsep="1" rowsep="1"/>
+              <tbody>
+                <row>
+                  <entry >
+                    <b>Resource</b>
+                  </entry>
+                  <entry >
+                    <b>Safe range</b>
+                  </entry>
+                  <entry >
+                    <b>Notes / CM tsquery to monitor</b>
+                  </entry>
+                </row>
+                <row>
+                  <entry >
+                    Memory
+                  </entry>
+                  <entry>
+                    <p >
+                      (Max JVM heap setting +
+                    </p>
+
+
+
+                    <p >
+                      query concurrency *
+                    </p>
+
+
+
+                    <p >
+                      query mem_limit)
+                    </p>
+
+
+
+                    <p >
+                      &lt;=
+                    </p>
+
+
+
+                    <p >
+                      80% of Impala process memory allocation
+                    </p>
+                  </entry>
+                  <entry>
+                    <p >
+                      <i>Memory usage</i>:
+                    </p>
+
+
+
+                    <p >
+                      SELECT mem_rss WHERE entityName = "Coordinator Instance 
ID" AND category =
+                      ROLE
+                    </p>
+
+
+
+                    <p >
+                      <i>JVM heap usage (metadata cache)</i>:
+                    </p>
+
+
+
+                    <p >
+                      SELECT impala_jvm_heap_current_usage_bytes WHERE 
entityName = "Coordinator
+                      Instance ID" AND category = ROLE (only in release 5.15 
and above)
+                    </p>
+                  </entry>
+                </row>
+                <row>
+                  <entry >
+                    TCP Connection
+                  </entry>
+                  <entry >
+                    Incoming + outgoing &lt; 16K
+                  </entry>
+                  <entry>
+                    <p >
+                      <i>Incoming connection usage</i>:
+                    </p>
+
+
+
+                    <p >
+                      SELECT thrift_server_backend_connections_in_use WHERE 
entityName =
+                      "Coordinator Instance ID" AND category = ROLE
+                    </p>
+
+
+
+                    <p >
+                      <i>Outgoing connection usage</i>:
+                    </p>
+
+
+
+                    <p >
+                      SELECT backends_client_cache_clients_in_use WHERE 
entityName =
+                      "Coordinator Instance ID" AND category = ROLE
+                    </p>
+                  </entry>
+                </row>
+                <row>
+                  <entry >
+                    Threads
+                  </entry>
+                  <entry >
+                    &lt; 32K
+                  </entry>
+                  <entry >
+                    SELECT thread_manager_running_threads WHERE entityName = 
"Coordinator
+                    Instance ID" AND category = ROLE
+                  </entry>
+                </row>
+                <row>
+                  <entry >
+                    CPU
+                  </entry>
+                  <entry>
+                    <p >
+                      Concurrency =
+                    </p>
+
+
+
+                    <p >
+                      non-DDL query concurrency &lt;=
+                    </p>
+
+
+
+                    <p>
+                      number of virtual cores allocated to Impala per node
+                    </p>
+                  </entry>
+                  <entry>
+                    <p >
+                      CPU usage estimation should be based on how many cores 
are allocated to
+                      Impala per node, not a sum of all cores of the cluster.
+                    </p>
+
+
+
+                    <p>
+                      It is recommended that concurrency should not be more 
than the number of
+                      virtual cores allocated to Impala per node.
+                    </p>
+
+
+
+                    <p ></p>
+
+
+
+                    <p >
+                      <i>Query concurrency:</i>
+                    </p>
+
+
+
+                    <p>
+                      SELECT 
total_impala_num_queries_registered_across_impalads WHERE
+                      entityName = "IMPALA-1" AND category = SERVICE
+                    </p>
+
+
+
+                    <p ></p>
+                  </entry>
+                </row>
+              </tbody>
+            </tgroup>
+          </table>
+        </p>
+
+        <p>
+          If usage of any of the above resources exceeds the safe range, add 
one more
+          coordinator.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="concept_omm_gf1_n2b">
+
+    <title>Deploying Dedicated Coordinators and Executors</title>
+
+    <conbody>
+
+      <p >
+        This section describes the process to configure a dedicated 
coordinator and a dedicated
+        executor roles for Impala.
+      </p>
+
+      <ul>
+        <li >
+          <p>
+            <b>Dedicated coordinator</b>: They should be on edge node with no 
other services
+            running on it. They donât need large local disks but still need 
some that can be
+            used for Spilling. They require at least same or even larger 
memory allocation.
+          </p>
+        </li>
+
+        <li >
+          <p>
+            <b>(Dedicated) Executors: </b>They should be collocated with 
DataNodes as usual.
+            The number of hosts with this setting typically increases as the 
cluster grows
+            larger and handles more table partitions, data files, and 
concurrent queries.
+          </p>
+        </li>
+      </ul>
+
+      <p> To configuring dedicated coordinators/executors, you specify one of
+        the following startup flags for the <cmdname>impalad</cmdname> daemon 
on
+        each host: <ul>
+          <li>
+            <p>
+              <codeph>is_executor=false</codeph> for each host that does not 
act
+              as an executor for Impala queries. These hosts act exclusively as
+              query coordinators. This setting typically applies to a 
relatively
+              small number of hosts, because the most common topology is to 
have
+              nearly all DataNodes doing work for query execution. </p>
+          </li>
+          <li>
+            <p>
+              <codeph>is_coordinator=false</codeph> for each host that does not
+              act as a coordinator for Impala queries. These hosts act
+              exclusively as executors. The number of hosts with this setting
+              typically increases as the cluster grows larger and handles more
+              table partitions, data files, and concurrent queries. As the
+              overhead for query coordination increases, it becomes more
+              important to centralize that work on dedicated hosts. </p>
+          </li>
+        </ul>
+      </p>
+
+    </conbody>
+
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/topics/impala_scalability.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_scalability.xml 
b/docs/topics/impala_scalability.xml
index 340d131..94c2fdb 100644
--- a/docs/topics/impala_scalability.xml
+++ b/docs/topics/impala_scalability.xml
@@ -305,120 +305,6 @@ Memory Usage: Additional Notes
     </conbody>
   </concept>
 
-  <concept id="scalability_coordinator" rev="2.9.0 IMPALA-3807 IMPALA-5147 
IMPALA-5503">
-
-    <title>Controlling which Hosts are Coordinators and Executors</title>
-
-    <conbody>
-
-      <p>
-        By default, each host in the cluster that runs the 
<cmdname>impalad</cmdname>
-        daemon can act as the coordinator for an Impala query, execute the 
fragments
-        of the execution plan for the query, or both. During highly concurrent
-        workloads for large-scale queries, especially on large clusters, the 
dual
-        roles can cause scalability issues:
-      </p>
-
-      <ul>
-        <li>
-          <p>
-            The extra work required for a host to act as the coordinator could 
interfere
-            with its capacity to perform other work for the earlier phases of 
the query.
-            For example, the coordinator can experience significant network 
and CPU overhead
-            during queries containing a large number of query fragments. Each 
coordinator
-            caches metadata for all table partitions and data files, which can 
be substantial
-            and contend with memory needed to process joins, aggregations, and 
other operations
-            performed by query executors.
-          </p>
-        </li>
-        <li>
-          <p>
-            Having a large number of hosts act as coordinators can cause 
unnecessary network
-            overhead, or even timeout errors, as each of those hosts 
communicates with the
-            <cmdname>statestored</cmdname> daemon for metadata updates.
-          </p>
-        </li>
-        <li>
-          <p>
-            The <q>soft limits</q> imposed by the admission control feature 
are more likely
-            to be exceeded when there are a large number of heavily loaded 
hosts acting as
-            coordinators.
-          </p>
-        </li>
-      </ul>
-
-      <p>
-        If such scalability bottlenecks occur, you can explicitly specify that 
certain
-        hosts act as query coordinators, but not executors for query fragments.
-        These hosts do not participate in I/O-intensive operations such as 
scans,
-        and CPU-intensive operations such as aggregations.
-      </p>
-
-      <p>
-        Then, you specify that the other hosts act as executors but not
-        coordinators. These hosts do communicate with the
-          <cmdname>statestored</cmdname> daemon or process the final result 
sets
-        from queries, but the executor hosts do not get the metadata topic from
-          <cmdname>statestored</cmdname>. You cannot connect to these hosts
-        through clients such as <cmdname>impala-shell</cmdname> or business
-        intelligence tools.
-      </p>
-
-      <p>
-        This feature is available in <keyword keyref="impala29_full"/> and 
higher.
-      </p>
-
-      <p>
-        To use this feature, you specify one of the following startup flags 
for the
-        <cmdname>impalad</cmdname> daemon on each host:
-      </p>
-
-      <ul>
-        <li>
-          <p>
-            <codeph>is_executor=false</codeph> for each host that
-            does not act as an executor for Impala queries.
-            These hosts act exclusively as query coordinators.
-            This setting typically applies to a relatively small number of
-            hosts, because the most common topology is to have nearly all
-            DataNodes doing work for query execution.
-          </p>
-        </li>
-        <li>
-          <p>
-            <codeph>is_coordinator=false</codeph> for each host that
-            does not act as a coordinator for Impala queries.
-            These hosts act exclusively as executors.
-            The number of hosts with this setting typically increases
-            as the cluster grows larger and handles more table partitions,
-            data files, and concurrent queries. As the overhead for query
-            coordination increases, it becomes more important to centralize
-            that work on dedicated hosts.
-          </p>
-        </li>
-      </ul>
-
-      <p>
-        By default, both of these settings are enabled for each 
<codeph>impalad</codeph>
-        instance, allowing all such hosts to act as both executors and 
coordinators.
-      </p>
-
-      <p>
-        For example, on a 100-node cluster, you might specify 
<codeph>is_executor=false</codeph>
-        for 10 hosts, to dedicate those hosts as query coordinators. Then 
specify
-        <codeph>is_coordinator=false</codeph> for the remaining 90 hosts. All 
explicit or
-        load-balanced connections must go to the 10 hosts acting as 
coordinators. These hosts
-        perform the network communication to keep metadata up-to-date and 
route query results
-        to the appropriate clients. The remaining 90 hosts perform the 
intensive I/O, CPU, and
-        memory operations that make up the bulk of the work for each query. If 
a bottleneck or
-        other performance issue arises on a specific host, you can narrow down 
the cause more
-        easily because each host is dedicated to specific operations within 
the overall
-        Impala workload.
-      </p>
-
-    </conbody>
-  </concept>
-
   <concept id="scalability_buffer_pool" rev="2.10.0 IMPALA-3200">
     <title>Effect of Buffer Pool on Memory Usage (<keyword 
keyref="impala210"/> and higher)</title>
     <conbody>

impala git commit: [DOCS] Dedicated coordinators and executors explained

Reply via email to