IMPALA-2924: [DOCS] Add docs for HDFS cache-related hints The JIRA discusses a RANDOM_REPLICA query option but Impala only has a SCHEDULE_RANDOM_REPLICA option. So I stated that the RANDOM_REPLICA hint is the same as specifying SCHEDULE_RANDOM_REPLICA=true. Please confirm.
Change-Id: I7284dd45c8173eef104ebd32789429e8c16c7bf2 Reviewed-on: http://gerrit.cloudera.org:8080/6631 Reviewed-by: Lars Volker <[email protected]> Reviewed-by: John Russell <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/8bdfe032 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/8bdfe032 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/8bdfe032 Branch: refs/heads/master Commit: 8bdfe032012e0b52550bc6784dc972b9dcfb5f7b Parents: cb1e4f6 Author: John Russell <[email protected]> Authored: Thu Apr 13 14:10:07 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Fri Apr 14 22:37:34 2017 +0000 ---------------------------------------------------------------------- docs/topics/impala_hints.xml | 42 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/8bdfe032/docs/topics/impala_hints.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_hints.xml b/docs/topics/impala_hints.xml index 7d833f6..4524c14 100644 --- a/docs/topics/impala_hints.xml +++ b/docs/topics/impala_hints.xml @@ -80,7 +80,8 @@ INSERT <varname>insert_clauses</varname> <p rev="2.0.0"> In <keyword keyref="impala20_full"/> and higher, you can also specify the hints inside comments that use either the <codeph>/* */</codeph> or <codeph>--</codeph> notation. Specify a <codeph>+</codeph> symbol - immediately before the hint name. + immediately before the hint name. Recently added hints are only available using the <codeph>/* */</codeph> + and <codeph>--</codeph> notation. </p> <codeblock rev="2.0.0">SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM @@ -102,6 +103,12 @@ INSERT <varname>insert_clauses</varname> INSERT <varname>insert_clauses</varname> -- +SHUFFLE|NOSHUFFLE SELECT <varname>remainder_of_query</varname>; + +<ph rev="IMPALA-2924">SELECT <varname>select_list</varname> FROM +<varname>table_ref</varname> + /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE} + [,RANDOM_REPLICA] */ +<varname>remainder_of_query</varname>;</ph> </codeblock> <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> @@ -109,7 +116,7 @@ INSERT <varname>insert_clauses</varname> <p> With both forms of hint syntax, include the <codeph>STRAIGHT_JOIN</codeph> keyword immediately after the <codeph>SELECT</codeph> keyword to prevent Impala from - reordering the tables in a way that makes the hint ineffective. + reordering the tables in a way that makes the join-related hints ineffective. </p> <p> @@ -163,6 +170,37 @@ INSERT <varname>insert_clauses</varname> <p conref="../shared/impala_common.xml#common/insert_hints"/> + <p rev="IMPALA-2924"> + <b>Hints for scheduling of HDFS blocks:</b> + </p> + + <p rev="IMPALA-2924"> + The hints <codeph>/* +SCHEDULE_CACHE_LOCAL */</codeph>, + <codeph>/* +SCHEDULE_DISK_LOCAL */</codeph>, and + <codeph>/* +SCHEDULE_REMOTE */</codeph> have the same effect + as specifying the <codeph>REPLICA_PREFERENCE</codeph> query + option with the respective option settings of <codeph>CACHE_LOCAL</codeph>, + <codeph>DISK_LOCAL</codeph>, or <codeph>REMOTE</codeph>. + The hint <codeph>/* +RANDOM_REPLICA */</codeph> is the same as + enabling the <codeph>SCHEDULE_RANDOM_REPLICA</codeph> query option. + </p> + + <p rev="IMPALA-2924"> + You can use these hints in combination by separating them with commas, + for example, <codeph>/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</codeph>. + See <xref keyref="replica_preference"/> and + <xref keyref="schedule_random_replica"/> for information about how + these settings influence the way Impala processes HDFS data blocks. + </p> + + <p rev="IMPALA-2924"> + Specifying the replica preference as a query hint always overrides the + query option setting. Specifying either the <codeph>SCHEDULE_RANDOM_REPLICA</codeph> + query option or the corresponding <codeph>RANDOM_REPLICA</codeph> query hint + enables the random tie-breaking behavior when processing data blocks + during the query. + </p> + <p> <b>Suggestions versus directives:</b> </p>
