Added detail usage notes for REPLICA_PREFERENCE.

Change-Id: If38f9c881f553568c2516ecc23ec501f23ee1f28
Reviewed-by: John Russell <>
Tested-by: Impala Public Jenkins


Branch: refs/heads/2.x
Commit: b96cbfd09a76ad1e14b970e1e450ac3935042db2
Parents: a0450d2
Author: Alex Rodoni <>
Authored: Fri Mar 30 14:54:39 2018 -0700
Committer: Impala Public Jenkins <>
Committed: Wed Apr 11 22:55:59 2018 +0000

 docs/topics/impala_replica_preference.xml | 49 ++++++++++++++++++++------
 1 file changed, 38 insertions(+), 11 deletions(-)
diff --git a/docs/topics/impala_replica_preference.xml 
index 45a5dbd..6c0d3ab 100644
--- a/docs/topics/impala_replica_preference.xml
+++ b/docs/topics/impala_replica_preference.xml
@@ -21,7 +21,13 @@ under the License.
 <concept id="replica_preference" rev="2.7.0">
   <title>REPLICA_PREFERENCE Query Option (<keyword keyref="impala27"/> or 
higher only)</title>
-  <titlealts audience="PDF"><navtitle>REPLICA_PREFERENCE</navtitle></titlealts>
+  <titlealts audience="PDF">
+    <navtitle>REPLICA_PREFERENCE</navtitle>
+  </titlealts>
       <data name="Category" value="Impala"/>
@@ -38,29 +44,50 @@ under the License.
-      The <codeph>REPLICA_PREFERENCE</codeph> query option
-      lets you spread the load more evenly if hotspots and bottlenecks 
persist, by allowing hosts to do local reads,
-      or even remote reads, to retrieve the data for cached blocks if Impala 
can determine that it would be
-      too expensive to do all such processing on a particular host.
+      The <codeph>REPLICA_PREFERENCE</codeph> query option lets you distribute 
the work more
+      evenly if hotspots and bottlenecks persist. It causes the access cost of 
all replicas of a
+      data block to be considered equal to or worse than the configured value. 
This allows
+      Impala to schedule reads to suboptimal replicas (e.g. local in the 
presence of cached
+      ones) in order to distribute the work across more executor nodes.
-      <b>Type:</b> numeric (0, 2, 4)
-      or corresponding mnemonic strings (<codeph>CACHE_LOCAL</codeph>, 
<codeph>DISK_LOCAL</codeph>, <codeph>REMOTE</codeph>).
-      The gaps in the numeric sequence are to accomodate other intermediate
-      values that might be added in the future.
+      Allowed values are: <codeph>CACHE_LOCAL</codeph> (<codeph>0</codeph>),
+      <codeph>DISK_LOCAL</codeph> (<codeph>2</codeph>), <codeph>REMOTE</codeph>
+      (<codeph>4</codeph>)
-      <b>Default:</b> 0 (equivalent to <codeph>CACHE_LOCAL</codeph>)
+      <b>Type:</b> Enum
+    </p>
+    <p>
+      <b>Default:</b> <codeph>CACHE_LOCAL (0)</codeph>
     <p conref="../shared/impala_common.xml#common/added_in_270"/>
+    <p>
+      <b>Usage Notes:</b>
+    </p>
+    <p>
+      By default Impala selects the best replica it can find in terms of 
access cost. The
+      preferred order is cached, local, and remote. With 
+      the preference of all replicas are capped at the selected value. For 
example, when
+      <codeph>REPLICA_PREFERENCE</codeph> is set to 
<codeph>DISK_LOCAL</codeph>, cached and
+      local replicas are treated with the equal preference. When set to
+      <codeph>REMOTE</codeph>, all three types of replicas, cached, local, 
remote, are treated
+      with equal preference.
+    </p>
     <p conref="../shared/impala_common.xml#common/related_info"/>
-      <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>, <xref 
+      <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>,
+      <xref href="impala_schedule_random_replica.xml#schedule_random_replica"/>

Reply via email to