[48/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:13:32 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
new file mode 100644
index 0000000..b0c9ee5
--- /dev/null
+++ b/docs/shared/impala_common.xml
@@ -0,0 +1,3690 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="common">
+
+  <title>Reusable Text, Paragraphs, List Items, and Other Elements for 
Impala</title>
+
+  <conbody>
+
+    <p>
+      All the elements in this file with IDs are intended to be conref'ed 
elsewhere. Practically all of the
+      conref'ed elements for the Impala docs are in this file, to avoid 
questions of when it's safe to remove or
+      move something in any of the 'main' files, and avoid having to change 
and conref references as a result.
+    </p>
+
+    <p>
+      This file defines some dummy subheadings as section elements, just for 
self-documentation. Using sections
+      instead of nested concepts lets all the conref links point to a very 
simple name pattern,
+      '#common/id_within_the_file', rather than a 3-part reference with an 
intervening, variable concept ID.
+    </p>
+
+    <section id="concepts">
+
+      <title>Conceptual Content</title>
+
+      <p>
+        Overview and conceptual information for Impala as a whole.
+      </p>
+
+      <!-- Reconcile the 'advantages' and 'benefits' elements; be mindful of 
where each is used. -->
+
+      <p id="impala_advantages">
+        The following are some of the key advantages of Impala:
+
+        <ul>
+          <li>
+            Impala integrates with the existing <keyword 
keyref="hadoop_distro"/> ecosystem, meaning data can be stored, shared, and 
accessed using
+            the various solutions included with <keyword 
keyref="hadoop_distro"/>. This also avoids data silos and minimizes expensive 
data movement.
+          </li>
+
+          <li>
+            Impala provides access to data stored in <keyword 
keyref="hadoop_distro"/> without requiring the Java skills required for 
MapReduce jobs.
+            Impala can access data directly from the HDFS file system. Impala 
also provides a SQL front-end to access
+            data in the HBase database system, <ph rev="2.2.0">or in the 
Amazon Simple Storage System (S3)</ph>.
+          </li>
+
+          <li>
+            Impala returns results typically within seconds or a few minutes, 
rather than the many minutes or hours
+            that are often required for Hive queries to complete.
+          </li>
+
+          <li>
+            Impala is pioneering the use of the Parquet file format, a 
columnar storage layout that is optimized for
+            large-scale queries typical in data warehouse scenarios.
+          </li>
+        </ul>
+      </p>
+
+      <p id="impala_benefits">
+        Impala provides:
+
+        <ul>
+          <li>
+            Familiar SQL interface that data scientists and analysts already 
know.
+          </li>
+
+          <li>
+            Ability to query high volumes of data (<q>big data</q>) in Apache 
Hadoop.
+          </li>
+
+          <li>
+            Distributed queries in a cluster environment, for convenient 
scaling and to make use of cost-effective
+            commodity hardware.
+          </li>
+
+          <li>
+            Ability to share data files between different components with no 
copy or export/import step; for example,
+            to write with Pig, transform with Hive and query with Impala. 
Impala can read from and write to Hive
+            tables, enabling simple data interchange using Impala for 
analytics on Hive-produced data.
+          </li>
+
+          <li>
+            Single system for big data processing and analytics, so customers 
can avoid costly modeling and ETL just
+            for analytics.
+          </li>
+        </ul>
+      </p>
+
+    </section>
+
+    <section id="sentry">
+
+      <title>Sentry-Related Content</title>
+
+      <p>
+        Material related to Sentry security, intended to be reused between 
Hive and Impala. Complicated by the fact
+        that most of it will probably be multi-paragraph or involve subheads, 
might need to be represented as
+        nested topics at the end of this file.
+      </p>
+
+      <table id="sentry_privileges_objects">
+        <title>Valid privilege types and objects they apply to</title>
+        <tgroup cols="2">
+          <colspec colnum="1" colname="col1" colwidth="1*"/>
+          <colspec colnum="2" colname="col2" colwidth="2*"/>
+          <thead>
+            <row>
+              <entry><b>Privilege</b></entry>
+              <entry><b>Object</b></entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>INSERT</entry>
+              <entry>DB, TABLE</entry>
+            </row>
+            <row>
+              <entry>SELECT</entry>
+              <entry>DB, TABLE, COLUMN</entry>
+            </row>
+            <row>
+              <entry>ALL</entry>
+              <entry>SERVER, TABLE, DB, URI</entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+      <table id="privileges_sql">
+        <title>Privilege table for Hive &amp; Impala operations</title>
+        <tgroup cols="4">
+          <colspec colnum="1" colname="col1" colwidth="1.31*"/>
+          <colspec colnum="2" colname="col2" colwidth="1.17*"/>
+          <colspec colnum="3" colname="col3" colwidth="1*"/>
+          <colspec colname="newCol4" colnum="4" colwidth="1*"/>
+          <thead>
+            <row>
+              <entry>Operation</entry>
+              <entry>Scope</entry>
+              <entry>Privileges Required</entry>
+              <entry>URI</entry>
+            </row>
+          </thead>
+          <tbody>
+            <row id="create_database_privs">
+              <entry>CREATE DATABASE</entry>
+              <entry>SERVER</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="drop_database_privs">
+              <entry>DROP DATABASE</entry>
+              <entry>DATABASE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="create_table_privs">
+              <entry>CREATE TABLE</entry>
+              <entry>DATABASE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="drop_table_privs">
+              <entry>DROP TABLE</entry>
+              <entry>TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="create_view_privs">
+              <entry>CREATE VIEW<p>-This operation is allowed if you have
+                  column-level <codeph>SELECT</codeph> access to the columns
+                  being used.</p></entry>
+              <entry>DATABASE; SELECT on TABLE; </entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row>
+              <entry>ALTER VIEW<p>-This operation is allowed if you have
+                  column-level <codeph>SELECT</codeph> access to the columns
+                  being used.</p></entry>
+              <entry>VIEW/TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="drop_view_privs">
+              <entry>DROP VIEW</entry>
+              <entry>VIEW/TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_add_columns_privs">
+              <entry>ALTER TABLE .. ADD COLUMNS</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_replace_columns_privs">
+              <entry>ALTER TABLE .. REPLACE COLUMNS</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_change_column_privs">
+              <entry>ALTER TABLE .. CHANGE column</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_rename_privs">
+              <entry>ALTER TABLE .. RENAME</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_set_tblproperties_privs">
+              <entry>ALTER TABLE .. SET TBLPROPERTIES</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_set_fileformat_privs">
+              <entry>ALTER TABLE .. SET FILEFORMAT</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_set_location_privs">
+              <entry>ALTER TABLE .. SET LOCATION</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry>URI</entry>
+            </row>
+            <row id="alter_table_add_partition_privs">
+              <entry>ALTER TABLE .. ADD PARTITION</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_add_partition_location_privs">
+              <entry>ALTER TABLE .. ADD PARTITION location</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry>URI</entry>
+            </row>
+            <row id="alter_table_drop_partition_privs">
+              <entry>ALTER TABLE .. DROP PARTITION</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_partition_set_fileformat_privs">
+              <entry>ALTER TABLE .. PARTITION SET FILEFORMAT</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="show_create_table_privs">
+              <entry>SHOW CREATE TABLE</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="show_partitions_privs">
+              <entry>SHOW PARTITIONS</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row>
+              <entry>SHOW TABLES<p>-Output includes all the tables for which
+                  the user has table-level privileges and all the tables for
+                  which the user has some column-level privileges.</p></entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row>
+              <entry>SHOW GRANT ROLE<p>-Output includes an additional field
+                  for any column-level privileges.</p></entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="describe_table_privs">
+              <entry>DESCRIBE TABLE<p>-Output shows <i>all</i> columns if the
+                  user has table level-privileges or <codeph>SELECT</codeph>
+                  privilege on at least one table column</p></entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="load_data_privs">
+              <entry>LOAD DATA</entry>
+              <entry>TABLE</entry>
+              <entry>INSERT</entry>
+              <entry>URI</entry>
+            </row>
+            <row id="select_privs">
+              <entry>SELECT<p>-You can grant the SELECT privilege on a view to
+                  give users access to specific columns of a table they do not
+                  otherwise have access to.</p><p>-See
+                  <xref audience="integrated" 
href="../topics/sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref
 audience="standalone" 
href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html";
 format="html" scope="external"/>
+                  for details on allowed column-level
+                operations.</p></entry>
+              <entry>VIEW/TABLE; COLUMN</entry>
+              <entry>SELECT</entry>
+              <entry/>
+            </row>
+            <row id="insert_overwrite_table_privs">
+              <entry>INSERT OVERWRITE TABLE</entry>
+              <entry>TABLE</entry>
+              <entry>INSERT</entry>
+              <entry/>
+            </row>
+            <row id="create_table_as_select_privs">
+              <entry>CREATE TABLE .. AS SELECT<p>-This operation is allowed if
+                  you have column-level <codeph>SELECT</codeph> access to the
+                  columns being used.</p></entry>
+              <entry>DATABASE; SELECT on TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="use_privs">
+              <entry>USE &lt;dbName&gt;</entry>
+              <entry>Any</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row id="create_function_privs">
+              <entry>CREATE FUNCTION</entry>
+              <entry>SERVER</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_set_serdeproperties_privs">
+              <entry>ALTER TABLE .. SET SERDEPROPERTIES</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row>
+              <entry>ALTER TABLE .. PARTITION SET SERDEPROPERTIES</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="_privs">
+              <entry namest="col1" nameend="newCol4"><b>Hive-Only
+                  Operations</b></entry>
+            </row>
+            <row id="insert_overwrite_directory_privs">
+              <entry>INSERT OVERWRITE DIRECTORY</entry>
+              <entry>TABLE</entry>
+              <entry>INSERT</entry>
+              <entry>URI</entry>
+            </row>
+            <row id="analyze_table_privs">
+              <entry>Analyze TABLE</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT + INSERT</entry>
+              <entry/>
+            </row>
+            <row id="import_table_privs">
+              <entry>IMPORT TABLE</entry>
+              <entry>DATABASE</entry>
+              <entry>ALL</entry>
+              <entry>URI</entry>
+            </row>
+            <row id="export_table_privs">
+              <entry>EXPORT TABLE</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT</entry>
+              <entry>URI</entry>
+            </row>
+            <row id="alter_table_touch_privs">
+              <entry>ALTER TABLE TOUCH</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_touch_partition_privs">
+              <entry>ALTER TABLE TOUCH PARTITION</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_clustered_by_privs">
+              <entry>ALTER TABLE .. CLUSTERED BY SORTED BY</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_enable_privs">
+              <entry>ALTER TABLE .. ENABLE/DISABLE</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_partition_enable_privs">
+              <entry>ALTER TABLE .. PARTITION ENABLE/DISABLE</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row id="alter_table_partition_rename_privs">
+              <entry>ALTER TABLE .. PARTITION.. RENAME TO PARTITION</entry>
+              <entry>TABLE</entry>
+              <entry>ALL on DATABASE</entry>
+              <entry/>
+            </row>
+            <row>
+              <entry>MSCK REPAIR TABLE</entry>
+              <entry>TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="alter_database_privs">
+              <entry>ALTER DATABASE</entry>
+              <entry>DATABASE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="describe_database_privs">
+              <entry>DESCRIBE DATABASE</entry>
+              <entry>DATABASE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="show_columns_privs">
+              <entry>SHOW COLUMNS<p>-Output for this operation filters columns
+                  to which the user does not have explicit
+                    <codeph>SELECT</codeph> access </p></entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="create_index_privs">
+              <entry>CREATE INDEX</entry>
+              <entry>TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="drop_index_privs">
+              <entry>DROP INDEX</entry>
+              <entry>TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="show_indexes_privs">
+              <entry>SHOW INDEXES</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="grant_privs">
+              <entry>GRANT PRIVILEGE</entry>
+              <entry>Allowed only for Sentry admin users</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row id="revoke_privs">
+              <entry>REVOKE PRIVILEGE</entry>
+              <entry>Allowed only for Sentry admin users</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row id="show_grants_privs">
+              <entry>SHOW GRANTS</entry>
+              <entry>Allowed only for Sentry admin users</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row id="show_tblproperties_privs">
+              <entry>SHOW TBLPROPERTIES</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="describe_table_partition_privs">
+              <entry>DESCRIBE TABLE .. PARTITION</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="add_jar_privs">
+              <entry>ADD JAR</entry>
+              <entry>Not Allowed</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row id="add_file_privs">
+              <entry>ADD FILE</entry>
+              <entry>Not Allowed</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row id="dfs_privs">
+              <entry>DFS</entry>
+              <entry>Not Allowed</entry>
+              <entry/>
+              <entry/>
+            </row>
+            <row>
+              <entry namest="col1" nameend="newCol4"><b>Impala-Only
+                  Operations</b></entry>
+            </row>
+            <row id="explain_privs">
+              <entry>EXPLAIN</entry>
+              <entry>TABLE; COLUMN</entry>
+              <entry>SELECT</entry>
+              <entry/>
+            </row>
+            <row id="invalidate_metadata_privs">
+              <entry>INVALIDATE METADATA</entry>
+              <entry>SERVER</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="invalidate_metadata_table_privs">
+              <entry>INVALIDATE METADATA &lt;table name&gt;</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="refresh_privs">
+              <entry>REFRESH &lt;table name&gt; or REFRESH &lt;table name&gt; 
PARTITION (&lt;partition_spec&gt;)</entry>
+              <entry>TABLE</entry>
+              <entry>SELECT/INSERT</entry>
+              <entry/>
+            </row>
+            <row id="drop_function_privs">
+              <entry>DROP FUNCTION</entry>
+              <entry>SERVER</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+            <row id="compute_stats_privs">
+              <entry>COMPUTE STATS</entry>
+              <entry>TABLE</entry>
+              <entry>ALL</entry>
+              <entry/>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+    <p rev="IMPALA-2660 CDH-40241" id="auth_to_local_instructions">
+      In <keyword keyref="impala26_full"/> and higher, Impala recognizes the 
<codeph>auth_to_local</codeph> setting,
+      specified through the HDFS configuration setting
+      <codeph>hadoop.security.auth_to_local</codeph>
+      or the Cloudera Manager setting
+      <uicontrol>Additional Rules to Map Kerberos Principals to Short 
Names</uicontrol>.
+      This feature is disabled by default, to avoid an unexpected change in 
security-related behavior.
+      To enable it:
+      <ul>
+        <li>
+          <p>
+            For clusters not managed by Cloudera Manager, specify 
<codeph>--load_auth_to_local_rules=true</codeph>
+            in the <cmdname>impalad</cmdname> and 
<cmdname>catalogd</cmdname>configuration settings.
+          </p>
+        </li>
+        <li>
+          <p>
+            For clusters managed by Cloudera Manager, select the 
+            <uicontrol>Use HDFS Rules to Map Kerberos Principals to Short 
Names</uicontrol>
+            checkbox to enable the service-wide 
<codeph>load_auth_to_local_rules</codeph> configuration setting.
+            Then restart the Impala service.
+          </p>
+        </li>
+      </ul>
+      See <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_to_local_isolate.html";
 scope="external" format="html">Using Auth-to-Local Rules to Isolate Cluster 
Users</xref> for general information about this feature.
+    </p>
+
+    <note id="authentication_vs_authorization">
+      Regardless of the authentication mechanism used, Impala always creates 
HDFS directories and data files
+      owned by the same user (typically <codeph>impala</codeph>). To implement 
user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry 
authorization feature, as explained in
+      <xref href="../topics/impala_authorization.xml#authorization"/>.
+    </note>
+
+<!-- Contrived nesting needed to allow <ph> with ID to be reused inside the 
<title> of a conref. -->
+
+    <p>
+      <b><ph id="title_sentry_debug">Debugging Failed Sentry Authorization 
Requests</ph></b>
+    </p>
+
+    <p id="sentry_debug">
+      Sentry logs all facts that lead up to authorization decisions at the 
debug level. If you do not understand
+      why Sentry is denying access, the best way to debug is to temporarily 
turn on debug logging:
+      <ul>
+        <li>
+          In Cloudera Manager, add 
<codeph>log4j.logger.org.apache.sentry=DEBUG</codeph> to the logging settings
+          for your service through the corresponding <uicontrol>Logging Safety 
Valve</uicontrol> field for the
+          Impala, Hive Server 2, or Solr Server services.
+        </li>
+
+        <li>
+          On systems not managed by Cloudera Manager, add 
<codeph>log4j.logger.org.apache.sentry=DEBUG</codeph>
+          to the <filepath>log4j.properties</filepath> file on each host in 
the cluster, in the appropriate
+          configuration directory for each service.
+        </li>
+      </ul>
+      Specifically, look for exceptions and messages such as:
+<codeblock xml:space="preserve">FilePermission server..., RequestPermission 
server...., result [true|false]</codeblock>
+      which indicate each evaluation Sentry makes. The 
<codeph>FilePermission</codeph> is from the policy file,
+      while <codeph>RequestPermission</codeph> is the privilege required for 
the query. A
+      <codeph>RequestPermission</codeph> will iterate over all appropriate 
<codeph>FilePermission</codeph>
+      settings until a match is found. If no matching privilege is found, 
Sentry returns <codeph>false</codeph>
+      indicating <q>Access Denied</q> .
+<!--
+[1]
+Impala: Impala Daemon -> Advanced -> Impala Daemon Logging Safety Valve 
+Hive: Hive Server 2 -> Advanced -> HiveServer2 Logging Safety Valve 
+Search: Solr Server -> Advanced -> HiveServer2 Logging Safety Valve
+-->
+    </p>
+
+  </section>
+
+  <section id="cm">
+
+    <title>Cloudera Manager Terminology</title>
+
+    <p>
+      Especially during the transition from CM 4 to CM 5, we'll use some stock 
phraseology to talk about fields
+      and such. Also there are some task steps etc. to conref under the Impala 
Service page that are easier
+      to keep track of here instead of in cm_common_elements.xml. (Although as 
part of Apache work, anything
+      CM might naturally move out of this file.)
+    </p>
+
+    <p>
+      <ph id="safety_valve"> In Cloudera Manager 4, these fields are labelled 
<uicontrol>Safety
+      Valve</uicontrol>; in Cloudera Manager 5, they are called 
<uicontrol>Advanced Configuration
+      Snippet</uicontrol>. </ph>
+    </p>
+
+    <ul>
+      <li id="go_impala_service">Go to the Impala service.</li>
+      <li id="restart_impala_service">Restart the Impala service.</li>
+    </ul>
+
+  </section>
+
+  <section id="citi">
+
+    <title>Items from the Citibank Escalation Spreadsheet</title>
+
+    <p>
+      Paragraphs with IDs are intended to be reused both in the FAQ and the 
User's Guide. They refer to feature
+      requests or misunderstandings encountered by Citibank, captured in the 
escalation spreadsheet here:
+      <xref 
href="https://docs.google.com/a/cloudera.com/spreadsheet/ccc?key=0AplfwQJKyyTWdFdhY0E5WHVwNXZSTG9sMEZwQy1QZ1E&amp;usp=drive_web#gid=0";
 scope="external" format="html"/>.
+    </p>
+
+    <p id="string_concatenation">
+      With Impala, you use the built-in <codeph>CONCAT()</codeph> function to 
concatenate two, three, or more
+      strings:
+<codeblock xml:space="preserve">select concat('some prefix: ', col1) from t1;
+select concat('abc','mno','xyz');</codeblock>
+      Impala does not currently support operators for string concatenation, 
such as <codeph>||</codeph> as seen
+      in some other database systems.
+    </p>
+
+    <p id="column_aliases">
+      You can specify column aliases with or without the <codeph>AS</codeph> 
keyword, and with no quotation
+      marks, single quotation marks, or double quotation marks. Some kind of 
quotation marks are required if the
+      column alias contains any spaces or other problematic characters. The 
alias text is displayed in the
+      <cmdname>impala-shell</cmdname> output as all-lowercase. For example:
+<codeblock xml:space="preserve">[localhost:21000] &gt; select c1 First_Column 
from t;
+[localhost:21000] &gt; select c1 as First_Column from t;
++--------------+
+| first_column |
++--------------+
+...
+
+[localhost:21000] &gt; select c1 'First Column' from t;
+[localhost:21000] &gt; select c1 as 'First Column' from t;
++--------------+
+| first column |
++--------------+
+...
+
+[localhost:21000] &gt; select c1 "First Column" from t;
+[localhost:21000] &gt; select c1 as "First Column" from t;
++--------------+
+| first column |
++--------------+
+...</codeblock>
+    </p>
+
+    <p id="temp_tables">
+      Currently, Impala does not support temporary tables. Some other database 
systems have a class of
+      <q>lightweight</q> tables that are held only in memory and/or that are 
only accessible by one connection
+      and disappear when the session ends. In Impala, creating new databases 
is a relatively lightweight
+      operation, so as an alternative, you could create a database with a 
unique name and use <codeph>CREATE
+      TABLE LIKE</codeph>, <codeph>CREATE TABLE AS SELECT</codeph>, and 
<codeph>INSERT</codeph> statements to
+      create a table in that database to hold the result set of a query, to 
use in subsequent queries. When
+      finished, issue a <codeph>DROP TABLE</codeph> statement followed by 
<codeph>DROP DATABASE</codeph>.
+    </p>
+
+  </section>
+
+  <section id="standards">
+
+    <title>Blurbs About Standards Compliance</title>
+
+    <p>
+      The following blurbs simplify the process of flagging which SQL standard 
various features were first
+      introduced in. The wording and the tagging can be modified by editing 
one central instance of each blurb.
+      Not extensively used yet, just here and there in the SQL Language 
Reference section.
+    </p>
+
+    <p id="sql1986">
+<!-- No Wikipedia page for SQL-1986, so no link. -->
+      <b>Standards compliance:</b> Introduced in SQL-1986.
+    </p>
+
+    <p id="sql1989">
+<!-- No Wikipedia page for SQL-1989, so no link. -->
+      <b>Standards compliance:</b> Introduced in SQL-1989.
+    </p>
+
+    <p id="sql1992">
+      <b>Standards compliance:</b> Introduced in
+      <xref href="http://en.wikipedia.org/wiki/SQL-92"; scope="external" 
format="html">SQL-1992</xref>.
+    </p>
+
+    <p id="sql1999">
+      <b>Standards compliance:</b> Introduced in
+      <xref href="http://en.wikipedia.org/wiki/SQL:1999"; scope="external" 
format="html">SQL:1999</xref>.
+    </p>
+
+    <p id="sql2003">
+      <b>Standards compliance:</b> Introduced in
+      <xref href="http://en.wikipedia.org/wiki/SQL:2003"; scope="external" 
format="html">SQL:2003</xref>.
+    </p>
+
+    <p id="sql2008">
+      <b>Standards compliance:</b> Introduced in
+      <xref href="http://en.wikipedia.org/wiki/SQL:2008"; scope="external" 
format="html">SQL:2008</xref>.
+    </p>
+
+    <p id="sql2011">
+      <b>Standards compliance:</b> Introduced in
+      <xref href="http://en.wikipedia.org/wiki/SQL:2011"; scope="external" 
format="html">SQL:2011</xref>.
+    </p>
+
+    <p id="hiveql">
+      <b>Standards compliance:</b> Extension first introduced in HiveQL.
+    </p>
+
+    <p id="impalaql">
+      <b>Standards compliance:</b> Extension first introduced in Impala.
+    </p>
+
+  </section>
+
+  <section id="refresh_invalidate">
+
+    <title>Background Info for REFRESH, INVALIDATE METADATA, and General 
Metadata Discussion</title>
+
+    <p id="invalidate_then_refresh" rev="DOCS-1013">
+      Because <codeph>REFRESH <varname>table_name</varname></codeph> only 
works for tables that the current
+      Impala node is already aware of, when you create a new table in the Hive 
shell, enter
+      <codeph>INVALIDATE METADATA <varname>new_table</varname></codeph> before 
you can see the new table in
+      <cmdname>impala-shell</cmdname>. Once the table is known by Impala, you 
can issue <codeph>REFRESH
+      <varname>table_name</varname></codeph> after you add data files for that 
table.
+    </p>
+
+    <p id="refresh_vs_invalidate">
+      <codeph>INVALIDATE METADATA</codeph> and <codeph>REFRESH</codeph> are 
counterparts: <codeph>INVALIDATE
+      METADATA</codeph> waits to reload the metadata when needed for a 
subsequent query, but reloads all the
+      metadata for the table, which can be an expensive operation, especially 
for large tables with many
+      partitions. <codeph>REFRESH</codeph> reloads the metadata immediately, 
but only loads the block location
+      data for newly added data files, making it a less expensive operation 
overall. If data was altered in some
+      more extensive way, such as being reorganized by the HDFS balancer, use 
<codeph>INVALIDATE
+      METADATA</codeph> to avoid a performance penalty from reduced local 
reads. If you used Impala version 1.0,
+      the <codeph>INVALIDATE METADATA</codeph> statement works just like the 
Impala 1.0 <codeph>REFRESH</codeph>
+      statement did, while the Impala 1.1 <codeph>REFRESH</codeph> is 
optimized for the common use case of adding
+      new data files to an existing table, thus the table name argument is now 
required.
+    </p>
+
+  </section>
+
+    <section id="sql_ref">
+
+      <title>SQL Language Reference Snippets</title>
+
+      <p>
+        These reusable chunks were taken from conrefs originally in 
<filepath>ciiu_langref_sql.xml</filepath>. Or
+        they are primarily used in new SQL syntax topics underneath that 
parent topic.
+      </p>
+
+<codeblock id="parquet_fallback_schema_resolution_example"><![CDATA[
+create database schema_evolution;
+use schema_evolution;
+create table t1 (c1 int, c2 boolean, c3 string, c4 timestamp)
+  stored as parquet;
+insert into t1 values
+  (1, true, 'yes', now()),
+  (2, false, 'no', now() + interval 1 day);
+
+select * from t1;
++----+-------+-----+-------------------------------+
+| c1 | c2    | c3  | c4                            |
++----+-------+-----+-------------------------------+
+| 1  | true  | yes | 2016-06-28 14:53:26.554369000 |
+| 2  | false | no  | 2016-06-29 14:53:26.554369000 |
++----+-------+-----+-------------------------------+
+
+desc formatted t1;
+...
+| Location:   | /user/hive/warehouse/schema_evolution.db/t1 |
+...
+
+-- Make T2 have the same data file as in T1, including 2
+-- unused columns and column order different than T2 expects.
+load data inpath '/user/hive/warehouse/schema_evolution.db/t1'
+  into table t2;
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+
+-- 'position' is the default setting.
+-- Impala cannot read the Parquet file if the column order does not match.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=position;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to position
+
+select * from t2;
+WARNINGS: 
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+-- With the 'name' setting, Impala can read the Parquet data files
+-- despite mismatching column order.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to name
+
+select * from t2;
++-------------------------------+-------+
+| c4                            | c2    |
++-------------------------------+-------+
+| 2016-06-28 14:53:26.554369000 | true  |
+| 2016-06-29 14:53:26.554369000 | false |
++-------------------------------+-------+
+]]>
+</codeblock>
+
+      <note rev="IMPALA-3334" id="one_but_not_true">
+        In CDH 5.7.0 / Impala 2.5.0, only the value 1 enables the option, and 
the value
+        <codeph>true</codeph> is not recognized. This limitation is
+        tracked by the issue
+        <xref href="https://issues.cloudera.org/browse/IMPALA-3334"; 
scope="external" format="html">IMPALA-3334</xref>,
+        which shows the releases where the problem is fixed.
+      </note>
+
+      <p rev="IMPALA-3732" id="avro_2gb_strings">
+        The Avro specification allows string values up to 2**64 bytes in 
length. 
+        Impala queries for Avro tables use 32-bit integers to hold string 
lengths.
+        In <keyword keyref="impala25_full"/> and higher, Impala truncates 
<codeph>CHAR</codeph>
+        and <codeph>VARCHAR</codeph> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <codeph>STRING</codeph> value longer than 
(2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+      <p rev="2.6.0 IMPALA-3369" id="set_column_stats_example">
+        You specify a case-insensitive symbolic name for the kind of 
statistics:
+        <codeph>numDVs</codeph>, <codeph>numNulls</codeph>, 
<codeph>avgSize</codeph>, <codeph>maxSize</codeph>.
+        The key names and values are both quoted. This operation applies to an 
entire table,
+        not a specific partition. For example:
+<codeblock>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | 2                | 0      | 4        | 4        |
+| s      | STRING | 3                | -1     | 4        | -1       |
++--------+--------+------------------+--------+----------+----------+
+</codeblock>
+      </p>
+
+<codeblock id="set_numrows_example">create table analysis_data stored as 
parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the 
stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');</codeblock>
+
+<codeblock id="set_numrows_partitioned_example">-- If the table originally 
contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties 
('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');</codeblock>
+
+      <p id="int_overflow_behavior">
+        Impala does not return column overflows as <codeph>NULL</codeph>, so 
that customers can distinguish
+        between <codeph>NULL</codeph> data and overflow conditions similar to 
how they do so with traditional
+        database systems. Impala returns the largest or smallest value in the 
range for the type. For example,
+        valid values for a <codeph>tinyint</codeph> range from -128 to 127. In 
Impala, a <codeph>tinyint</codeph>
+        with a value of -200 returns -128 rather than <codeph>NULL</codeph>. A 
<codeph>tinyint</codeph> with a
+        value of 200 returns 127.
+      </p>
+
+      <p rev="2.5.0" id="partition_key_optimization">
+        If you frequently run aggregate functions such as 
<codeph>MIN()</codeph>, <codeph>MAX()</codeph>, and
+        <codeph>COUNT(DISTINCT)</codeph> on partition key columns, consider 
enabling the <codeph>OPTIMIZE_PARTITION_KEY_SCANS</codeph>
+        query option, which optimizes such queries. This feature is available 
in <keyword keyref="impala25_full"/> and higher.
+        See <xref href="../topics/impala_optimize_partition_key_scans.xml"/>
+        for the kinds of queries that this option applies to, and slight 
differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+      <p id="live_reporting_details">
+        The output from this query option is printed to standard error. The 
output is only displayed in interactive mode,
+        that is, not when the <codeph>-q</codeph> or <codeph>-f</codeph> 
options are used.
+      </p>
+
+      <p id="live_progress_live_summary_asciinema">
+        To see how the <codeph>LIVE_PROGRESS</codeph> and 
<codeph>LIVE_SUMMARY</codeph> query options
+        work in real time, see <xref 
href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk"; scope="external" 
format="html">this animated demo</xref>.
+      </p>
+
+      <p rev="2.5.0" id="runtime_filter_mode_blurb">
+        Because the runtime filtering feature is enabled by default only for 
local processing,
+        the other filtering-related query options have the greatest effect 
when used in
+        combination with the setting 
<codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
+      </p>
+
+      <p rev="2.5.0" id="runtime_filtering_option_caveat">
+        Because the runtime filtering feature applies mainly to 
resource-intensive
+        and long-running queries, only adjust this query option when tuning 
long-running queries
+        involving some combination of large partitioned tables and joins 
involving large tables.
+      </p>
+
+      <p rev="2.3.0" id="impala_shell_progress_reports_compute_stats_caveat">
+        The <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> 
query options
+        currently do not produce any output during <codeph>COMPUTE 
STATS</codeph> operations.
+      </p>
+
+<!-- This is a shorter version of the similar 'caveat' text. This shorter one 
can be reused more easily in various places. -->
+      <p rev="2.3.0" id="impala_shell_progress_reports_shell_only_blurb">
+        The <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> 
query options only apply
+        inside the <cmdname>impala-shell</cmdname> interpreter. You cannot use 
them with the
+        <codeph>SET</codeph> statement from a JDBC or ODBC application.
+      </p>
+
+      <p id="impala_shell_progress_reports_shell_only_caveat">
+        Because the <codeph>LIVE_PROGRESS</codeph> and 
<codeph>LIVE_SUMMARY</codeph> query options
+        are available only within the <cmdname>impala-shell</cmdname> 
interpreter: 
+        <ul>
+          <li>
+            <p>
+              You cannot change these query options through the SQL 
<codeph>SET</codeph>
+              statement using the JDBC or ODBC interfaces. The 
<codeph>SET</codeph>
+              command in <cmdname>impala-shell</cmdname> recognizes these 
names as
+              shell-only options.
+            </p>
+          </li>
+          <li>
+            <p>
+              Be careful when using <cmdname>impala-shell</cmdname> on a 
pre-CDH 5.5
+              system to connect to Impala running on a CDH 5.5 or higher 
system.
+              The older <cmdname>impala-shell</cmdname> does not recognize 
these
+              query option names. Upgrade <cmdname>impala-shell</cmdname> on 
the
+              systems where you intend to use these query options.
+            </p>
+          </li>
+          <li>
+            <p>
+              Likewise, the <cmdname>impala-shell</cmdname> command relies on
+              some information only available in <keyword 
keyref="impala23_full"/> and higher
+              to prepare live progress reports and query summaries. The
+              <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> 
+              query options have no effect when 
<cmdname>impala-shell</cmdname> connects
+              to a cluster running an older version of Impala.
+            </p>
+          </li>
+        </ul>
+      </p>
+
+<!-- Same example used in both CREATE DATABASE and DROP DATABASE. -->
+<codeblock id="create_drop_db_example">create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<i>ERROR: AnalysisException: Cannot drop current default database: temp</i>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<ph rev="2.3.0">-- or in <keyword keyref="impala23_full"/> and higher use the 
CASCADE clause.</ph>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive 
Metastore: 
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3   |
++------+
+
+<ph rev="2.3.0">-- <keyword keyref="impala23_full"/> and higher:</ph>
+<ph rev="2.3.0">drop database temp cascade;</ph>
+
+-- CDH 5.4 and lower:
+drop table temp.t3;
+drop database temp;
+</codeblock>
+
+      <p id="cast_convenience_fn_example">
+        This example shows how to use the <codeph>castto*()</codeph> functions 
as an equivalent
+        to <codeph>CAST(<varname>value</varname> AS 
<varname>type</varname>)</codeph> expressions.
+      </p>
+
+      <p id="cast_convenience_fn_usage"><b>Usage notes:</b>
+        A convenience function to skip the SQL <codeph>CAST 
<varname>value</varname> AS <varname>type</varname></codeph> syntax,
+        for example when programmatically generating SQL statements where a 
regular function call might be easier to construct.
+      </p>
+
+      <p rev="2.3.0" id="current_timezone_tip">
+        To determine the time zone of the server you are connected to, in CDH 
5.5 / Impala 2.3 and
+        higher you can call the <codeph>timeofday()</codeph> function, which 
includes the time zone
+        specifier in its return value. Remember that with cloud computing, the 
server you interact
+        with might be in a different time zone than you are, or different 
sessions might connect to
+        servers in different time zones, or a cluster might include servers in 
more than one time zone.
+      </p>
+
+      <p rev="2.2.0" id="timezone_conversion_caveat">
+        The way this function deals with time zones when converting to or from 
<codeph>TIMESTAMP</codeph>
+        values is affected by the 
<codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> startup flag for 
the
+        <cmdname>impalad</cmdname> daemon. See <xref 
href="../topics/impala_timestamp.xml#timestamp"/> for details about
+        how Impala handles time zone considerations for the 
<codeph>TIMESTAMP</codeph> data type.
+      </p>
+
+      <p rev="2.6.0 CDH-39913 IMPALA-3558" id="s3_drop_table_purge">
+        For best compatibility with the S3 write support in CDH 5.8 / Impala 
2.6
+        and higher:
+        <ul>
+        <li>Use native Hadoop techniques to create data files in S3 for 
querying through Impala.</li>
+        <li>Use the <codeph>PURGE</codeph> clause of <codeph>DROP 
TABLE</codeph> when dropping internal (managed) tables.</li>
+        </ul>
+        By default, when you drop an internal (managed) table, the data files 
are
+        moved to the HDFS trashcan. This operation is expensive for tables that
+        reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer 
to use
+        <codeph>DROP TABLE <varname>table_name</varname> PURGE</codeph> rather 
than the default <codeph>DROP TABLE</codeph> statement.
+        The <codeph>PURGE</codeph> clause makes Impala delete the data files 
immediately,
+        skipping the HDFS trashcan.
+        For the <codeph>PURGE</codeph> clause to work effectively, you must 
originally create the
+        data files on S3 using one of the tools from the Hadoop ecosystem, 
such as
+        <codeph>hadoop fs -cp</codeph>, or <codeph>INSERT</codeph> in Impala 
or Hive.
+      </p>
+
+      <p rev="2.6.0 CDH-39913 IMPALA-1878" id="s3_dml_performance">
+        Because of differences between S3 and traditional filesystems, DML 
operations
+        for S3 tables can take longer than for tables on HDFS. For example, 
both the
+        <codeph>LOAD DATA</codeph> statement and the final stage of the 
<codeph>INSERT</codeph>
+        and <codeph>CREATE TABLE AS SELECT</codeph> statements involve moving 
files from one directory
+        to another. (In the case of <codeph>INSERT</codeph> and <codeph>CREATE 
TABLE AS SELECT</codeph>,
+        the files are moved from a temporary staging directory to the final 
destination directory.)
+        Because S3 does not support a <q>rename</q> operation for existing 
objects, in these cases Impala
+        actually copies the data files from one location to another and then 
removes the original files.
+        In CDH 5.8 / Impala 2.6, the <codeph>S3_SKIP_INSERT_STAGING</codeph> 
query option provides a way
+        to speed up <codeph>INSERT</codeph> statements for S3 tables and 
partitions, with the tradeoff
+        that a problem during statement execution could leave data in an 
inconsistent state.
+        It does not apply to <codeph>INSERT OVERWRITE</codeph> or <codeph>LOAD 
DATA</codeph> statements.
+        See <xref 
href="../topics/impala_s3_skip_insert_staging.xml#s3_skip_insert_staging"/> for 
details.
+      </p>
+
+      <p rev="2.6.0 CDH-40329 IMPALA-3453" id="s3_block_splitting">
+        In <keyword keyref="impala26_full"/> and higher, Impala queries are 
optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, 
SequenceFile,
+        Avro, and uncompressed text, the setting 
<codeph>fs.s3a.block.size</codeph>
+        in the <filepath>core-site.xml</filepath> configuration file determines
+        how Impala divides the I/O work of reading the data files. This 
configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read 
operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 
queries primarily access
+        Parquet files written by MapReduce or Hive, increase 
<codeph>fs.s3a.block.size</codeph>
+        to 134217728 (128 MB) to match the row group size of those files. If 
most S3 queries involve
+        Parquet files written by Impala, increase 
<codeph>fs.s3a.block.size</codeph>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+      <note rev="2.6.0 CDH-39913 IMPALA-1878" id="s3_production" 
type="important">
+        <p>
+          In <keyword keyref="impala26_full"/> and higher, Impala supports 
both queries (<codeph>SELECT</codeph>)
+          and DML (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>, 
<codeph>CREATE TABLE AS SELECT</codeph>)
+          for data residing on Amazon S3. With the inclusion of write support,
+          <!-- and configuration settings for more secure S3 key management, 
-->
+          the Impala support for S3 is now considered ready for production use.
+        </p>
+      </note>
+
+      <note rev="2.2.0" id="s3_caveat" type="important">
+        <p> Impala query support for Amazon S3 is included in CDH 5.4.0, but is
+          not currently supported or recommended for production use. To try 
this
+          feature, use it in a test environment until Cloudera resolves
+          currently existing issues and limitations to make it ready for
+          production use. </p>
+      </note>
+
+      <p rev="2.6.0 CDH-39913 IMPALA-1878" id="s3_ddl">
+        In <keyword keyref="impala26_full"/> and higher, Impala DDL statements 
such as
+        <codeph>CREATE DATABASE</codeph>, <codeph>CREATE TABLE</codeph>, 
<codeph>DROP DATABASE CASCADE</codeph>,
+        <codeph>DROP TABLE</codeph>, and <codeph>ALTER TABLE [ADD|DROP] 
PARTITION</codeph> can create or remove folders
+        as needed in the Amazon S3 system. Prior to CDH 5.8 / Impala 2.6, you 
had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove 
folders when no longer needed.
+        See <xref href="../topics/impala_s3.xml#s3"/> for details about 
reading and writing S3 data with Impala.
+      </p>
+
+      <p rev="2.6.0 CDH-39913 IMPALA-1878" id="s3_dml">
+        In <keyword keyref="impala26_full"/> and higher, the Impala DML 
statements (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>,
+        and <codeph>CREATE TABLE AS SELECT</codeph>) can write data into a 
table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, 
because the S3 location for tables and
+        partitions is specified by an <codeph>s3a://</codeph> prefix in the
+        <codeph>LOCATION</codeph> attribute of
+        <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> 
statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms 
instead of Impala DML statements,
+        issue a <codeph>REFRESH</codeph> statement for the table before using 
Impala to query the S3 data.
+      </p>
+
+        <!-- Formerly part of s3_dml element. Moved out to avoid a circular 
link in the S3 topic itelf. -->
+        <!-- See <xref href="../topics/impala_s3.xml#s3"/> for details about 
reading and writing S3 data with Impala. -->
+
+      <p rev="2.2.0" id="s3_metadata">
+        The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> 
statements also cache metadata
+        for tables where the data resides in the Amazon Simple Storage Service 
(S3).
+        In particular, issue a <codeph>REFRESH</codeph> for a table after 
adding or removing files
+        in the associated S3 data directory.
+        See <xref href="../topics/impala_s3.xml#s3"/> for details about 
working with S3 tables.
+      </p>
+
+      <p id="y2k38" rev="2.2.0">
+        In Impala 2.2.0 and higher, built-in functions that accept or return 
integers representing <codeph>TIMESTAMP</codeph> values
+        use the <codeph>BIGINT</codeph> type for parameters and return values, 
rather than <codeph>INT</codeph>.
+        This change lets the date and time functions avoid an overflow error 
that would otherwise occur
+        on January 19th, 2038 (known as the
+        <xref href="http://en.wikipedia.org/wiki/Year_2038_problem"; 
scope="external" format="html"><q>Year 2038 problem</q> or <q>Y2K38 
problem</q></xref>).
+        This change affects the <codeph>from_unixtime()</codeph> and 
<codeph>unix_timestamp()</codeph> functions.
+        You might need to change application code that interacts with these 
functions, change the types of
+        columns that store the return values, or add <codeph>CAST()</codeph> 
calls to SQL statements that
+        call these functions.
+      </p>
+
+      <p id="timestamp_conversions">
+        Impala automatically converts <codeph>STRING</codeph> literals of the 
correct format into
+        <codeph>TIMESTAMP</codeph> values. Timestamp values are accepted in 
the format
+        <codeph>"yyyy-MM-dd HH:mm:ss.SSSSSS"</codeph>, and can consist of just 
the date, or just the time, with or
+        without the fractional second portion. For example, you can specify 
<codeph>TIMESTAMP</codeph> values such as
+        <codeph>'1966-07-30'</codeph>, <codeph>'08:30:00'</codeph>, or 
<codeph>'1985-09-25 17:45:30.005'</codeph>.
+        <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+      </p>
+
+
+      <p>
+        <ph id="cast_int_to_timestamp">Casting an integer or floating-point 
value <codeph>N</codeph> to
+        <codeph>TIMESTAMP</codeph> produces a value that is <codeph>N</codeph> 
seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date 
and time in the UTC time zone.
+        If the setting 
<codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph> is in effect,
+        the resulting <codeph>TIMESTAMP</codeph> represents a date and time in 
the local time zone.</ph>
+      </p>
+
+      <p id="redaction_yes" rev="2.2.0">
+        If these statements in your environment contain sensitive literal 
values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when 
displaying the statements in log files and
+        other administrative contexts. See
+        <xref audience="integrated" 
href="../topics/sg_redaction.xml#log_redact"/><xref audience="standalone" 
href="http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_redaction.html";
 scope="external" format="html"/>
+        for details.
+      </p>
+
+      <p id="incremental_partition_spec">
+        The <codeph>PARTITION</codeph> clause is only allowed in combination 
with the <codeph>INCREMENTAL</codeph>
+        clause. It is optional for <codeph>COMPUTE INCREMENTAL STATS</codeph>, 
and required for <codeph>DROP
+        INCREMENTAL STATS</codeph>. Whenever you specify partitions through 
the <codeph>PARTITION
+        (<varname>partition_spec</varname>)</codeph> clause in a 
<codeph>COMPUTE INCREMENTAL STATS</codeph> or
+        <codeph>DROP INCREMENTAL STATS</codeph> statement, you must include 
all the partitioning columns in the
+        specification, and specify constant values for all the partition key 
columns.
+      </p>
+
+      <p id="udf_persistence_restriction" rev="2.5.0 IMPALA-1748">
+        In <keyword keyref="impala25_full"/> and higher, Impala UDFs and UDAs 
written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new 
<codeph>CREATE FUNCTION</codeph> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <codeph>CREATE FUNCTION</codeph> 
syntax do not persist across restarts
+        because they are held in the memory of the <cmdname>catalogd</cmdname> 
daemon.
+        Until you re-create such Java UDFs using the new <codeph>CREATE 
FUNCTION</codeph> syntax,
+        you must reload those Java-based UDFs by running the original 
<codeph>CREATE FUNCTION</codeph> statements again each time
+        you restart the <cmdname>catalogd</cmdname> daemon.
+        Prior to <keyword keyref="impala25_full"/> the requirement to reload 
functions after a restart applied to both C++ and Java functions.
+      </p>
+
+      <p id="current_user_caveat" rev="CDH-36552">
+        The Hive <codeph>current_user()</codeph> function cannot be
+        called from a Java UDF through Impala.
+      </p>
+
+      <note id="add_partition_set_location">
+        If you are creating a partition for the first time and specifying its 
location, for maximum efficiency, use
+        a single <codeph>ALTER TABLE</codeph> statement including both the 
<codeph>ADD PARTITION</codeph> and
+        <codeph>LOCATION</codeph> clauses, rather than separate statements 
with <codeph>ADD PARTITION</codeph> and
+        <codeph>SET LOCATION</codeph> clauses.
+      </note>
+
+      <p id="insert_hidden_work_directory">
+        The <codeph>INSERT</codeph> statement has always left behind a hidden 
work directory inside the data
+        directory of the table. Formerly, this hidden work directory was named
+        <filepath>.impala_insert_staging</filepath> . In Impala 2.0.1 and 
later, this directory name is changed to
+        <filepath>_impala_insert_staging</filepath> . (While HDFS tools are 
expected to treat names beginning
+        either with underscore and dot as hidden, in practice names beginning 
with an underscore are more widely
+        supported.) If you have any scripts, cleanup jobs, and so on that rely 
on the name of this work directory,
+        adjust them to use the new name.
+      </p>
+
+      <p id="check_internal_external_table">
+        To see whether a table is internal or external, and its associated 
HDFS location, issue the statement
+        <codeph>DESCRIBE FORMATTED <varname>table_name</varname></codeph>. The 
<codeph>Table Type</codeph> field
+        displays <codeph>MANAGED_TABLE</codeph> for internal tables and 
<codeph>EXTERNAL_TABLE</codeph> for
+        external tables. The <codeph>Location</codeph> field displays the path 
of the table directory as an HDFS
+        URI.
+      </p>
+
+      <p id="switch_internal_external_table">
+        You can switch a table from internal to external, or from external to 
internal, by using the <codeph>ALTER
+        TABLE</codeph> statement:
+<codeblock xml:space="preserve">
+-- Switch a table from internal to external.
+ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('EXTERNAL'='TRUE');
+
+-- Switch a table from external to internal.
+ALTER TABLE <varname>table_name</varname> SET 
TBLPROPERTIES('EXTERNAL'='FALSE');
+</codeblock>
+      </p>
+
+<!-- The data to show sensible output from these queries is in the TPC-DS 
schema 'CUSTOMER' table.
+     If you want to show real output, add a LIMIT 5 or similar clause to each 
query to avoid
+     too-long output. -->
+
+<codeblock id="regexp_rlike_examples" xml:space="preserve">-- Find all 
customers whose first name starts with 'J', followed by 0 or more of any 
character.
+select c_first_name, c_last_name from customer where c_first_name regexp 
'^J.*';
+select c_first_name, c_last_name from customer where c_first_name rlike '^J.*';
+
+-- Find 'Macdonald', where the first 'a' is optional and the 'D' can be upper- 
or lowercase.
+-- The ^...$ are required, to match the start and end of the value.
+select c_first_name, c_last_name from customer where c_last_name regexp 
'^Ma?c[Dd]onald$';
+select c_first_name, c_last_name from customer where c_last_name rlike 
'^Ma?c[Dd]onald$';
+
+-- Match multiple character sequences, either 'Mac' or 'Mc'.
+select c_first_name, c_last_name from customer where c_last_name regexp 
'^(Mac|Mc)donald$';
+select c_first_name, c_last_name from customer where c_last_name rlike 
'^(Mac|Mc)donald$';
+
+-- Find names starting with 'S', then one or more vowels, then 'r', then any 
other characters.
+-- Matches 'Searcy', 'Sorenson', 'Sauer'.
+select c_first_name, c_last_name from customer where c_last_name regexp 
'^S[aeiou]+r.*$';
+select c_first_name, c_last_name from customer where c_last_name rlike 
'^S[aeiou]+r.*$';
+
+-- Find names that end with 2 or more vowels: letters from the set a,e,i,o,u.
+select c_first_name, c_last_name from customer where c_last_name regexp 
'.*[aeiou]{2,}$';
+select c_first_name, c_last_name from customer where c_last_name rlike 
'.*[aeiou]{2,}$';
+
+-- You can use letter ranges in the [] blocks, for example to find names 
starting with A, B, or C.
+select c_first_name, c_last_name from customer where c_last_name regexp 
'^[A-C].*';
+select c_first_name, c_last_name from customer where c_last_name rlike 
'^[A-C].*';
+
+-- If you are not sure about case, leading/trailing spaces, and so on, you can 
process the
+-- column using string functions first.
+select c_first_name, c_last_name from customer where lower(trim(c_last_name)) 
regexp '^de.*';
+select c_first_name, c_last_name from customer where lower(trim(c_last_name)) 
rlike '^de.*';
+</codeblock>
+
+      <p id="case_insensitive_comparisons_tip" rev="2.5.0 IMPALA-1787">
+        In <keyword keyref="impala25_full"/> and higher, you can simplify 
queries that
+        use many <codeph>UPPER()</codeph> and <codeph>LOWER()</codeph> calls
+        to do case-insensitive comparisons, by using the <codeph>ILIKE</codeph>
+        or <codeph>IREGEXP</codeph> operators instead. See
+        <xref href="../topics/impala_operators.xml#ilike"/> and
+        <xref href="../topics/impala_operators.xml#iregexp"/> for details.
+      </p>
+
+      <p id="show_security">
+        When authorization is enabled, the output of the <codeph>SHOW</codeph> 
statement is limited to those
+        objects for which you have some privilege. There might be other 
database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot 
see it in the <codeph>SHOW</codeph>
+        output, check with the system administrator if you need to be granted 
a new privilege for that object. See
+        <xref href="../topics/impala_authorization.xml#authorization"/> for 
how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p id="infinity_and_nan" rev="IMPALA-3267">
+        Infinity and NaN can be specified in text data files as 
<codeph>inf</codeph> and <codeph>nan</codeph>
+        respectively, and Impala interprets them as these special values. They 
can also be produced by certain
+        arithmetic expressions; for example, <codeph>pow(-1, 0.5)</codeph> 
returns <codeph>Infinity</codeph> and
+        <codeph>1/0</codeph> returns <codeph>NaN</codeph>. Or you can cast the 
literal values, such as <codeph>CAST('nan' AS
+        DOUBLE)</codeph> or <codeph>CAST('inf' AS DOUBLE)</codeph>.
+      </p>
+
+      <p rev="2.0.0" id="user_kerberized">
+        In Impala 2.0 and later, <codeph>user()</codeph> returns the full 
Kerberos principal string, such as
+        <codeph>[email protected]</codeph>, in a Kerberized environment.
+      </p>
+
+      <ul>
+        <li id="grant_revoke_single">
+          Currently, each Impala <codeph>GRANT</codeph> or 
<codeph>REVOKE</codeph> statement can only grant or
+          revoke a single privilege to or from a single role.
+        </li>
+      </ul>
+
+      <p id="blobs_are_strings">
+        All data in <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> columns 
must be in a character encoding that
+        is compatible with UTF-8. If you have binary data from another 
database system (that is, a BLOB type), use
+        a <codeph>STRING</codeph> column to hold it.
+      </p>
+
+<!-- The codeblock is nested inside this paragraph, so the intro text
+     and the code get conref'ed as a unit. -->
+
+      <p id="create_drop_view_examples">
+        The following example creates a series of views and then drops them. 
These examples illustrate how views
+        are associated with a particular database, and both the view 
definitions and the view names for
+        <codeph>CREATE VIEW</codeph> and <codeph>DROP VIEW</codeph> can refer 
to a view in the current database or
+        a fully qualified view name.
+<codeblock xml:space="preserve">
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</codeblock>
+      </p>
+
+      <p id="char_varchar_cast_from_string">
+        For <codeph>INSERT</codeph> operations into <codeph>CHAR</codeph> or 
<codeph>VARCHAR</codeph> columns, you
+        must cast all <codeph>STRING</codeph> literals or expressions 
returning <codeph>STRING</codeph> to to a
+        <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> type with the 
appropriate length.
+      </p>
+
+      <p rev="2.0.0" id="subquery_no_limit">
+        Correlated subqueries used in <codeph>EXISTS</codeph> and 
<codeph>IN</codeph> operators cannot include a
+        <codeph>LIMIT</codeph> clause.
+      </p>
+
+      <p id="avro_no_timestamp">
+        Currently, Avro tables cannot contain <codeph>TIMESTAMP</codeph> 
columns. If you need to store date and
+        time values in Avro tables, as a workaround you can use a 
<codeph>STRING</codeph> representation of the
+        values, convert the values to <codeph>BIGINT</codeph> with the 
<codeph>UNIX_TIMESTAMP()</codeph> function,
+        or create separate numeric columns for individual date and time fields 
using the <codeph>EXTRACT()</codeph>
+        function.
+      </p>
+
+      <p id="zero_length_strings">
+        <b>Zero-length strings:</b> For purposes of clauses such as 
<codeph>DISTINCT</codeph> and <codeph>GROUP
+        BY</codeph>, Impala considers zero-length strings 
(<codeph>""</codeph>), <codeph>NULL</codeph>, and space
+        to all be different values.
+      </p>
+
+      <p rev="2.5.0 IMPALA-3054" 
id="spill_to_disk_vs_dynamic_partition_pruning">
+        When the spill-to-disk feature is activated for a join node within a 
query, Impala does not
+        produce any runtime filters for that join operation on that host. 
Other join nodes within
+        the query are not affected.
+      </p>
+
+<codeblock id="simple_dpp_example">
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information 
about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year 
between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+|                                                          |
+| 04:EXCHANGE [UNPARTITIONED]                              |
+| |                                                        |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                 |
+| |  hash predicates: year = year                          |
+| |  <b>runtime filters: RF000 &lt;- year</b>                        |
+| |                                                        |
+| |--03:EXCHANGE [BROADCAST]                               |
+| |  |                                                     |
+| |  01:SCAN HDFS [dpp.yy]                                 |
+| |     partitions=2/4 files=2 size=468B                   |
+| |                                                        |
+| 00:SCAN HDFS [dpp.yy2]                                   |
+|    partitions=2/3 files=2 size=468B                      |
+|    <b>runtime filters: RF000 -> year</b>                        |
++----------------------------------------------------------+
+</codeblock>
+      <p id="order_by_scratch_dir">
+        By default, intermediate files used during large sort, join, 
aggregation, or analytic function operations
+        are stored in the directory <filepath>/tmp/impala-scratch</filepath> . 
These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform 
operations that use the <q>spill to disk</q>
+        technique, without any name conflicts for these temporary files.) You 
can specify a different location by
+        starting the <cmdname>impalad</cmdname> daemon with the
+        <codeph>--scratch_dirs="<varname>path_to_directory</varname>"</codeph> 
configuration option or the
+        equivalent configuration option in the Cloudera Manager user 
interface. You can specify a single directory,
+        or a comma-separated list of directories. The scratch directories must 
be on the local filesystem, not in
+        HDFS. You might specify different directory paths for different hosts, 
depending on the capacity and speed
+        of the available storage devices. In CDH 5.5 / Impala 2.3 or higher, 
Impala successfully starts (with a warning
+        written to the log) if it cannot create or read and write files in one 
of the scratch directories.
+        If there is less than 1 GB free on the filesystem where that directory 
resides, Impala still runs, but writes a
+        warning message to its log.  If Impala encounters an error reading or 
writing files in a scratch directory during
+        a query, Impala logs the error and the query fails.
+      </p>
+
+      <p id="order_by_view_restriction">
+        An <codeph>ORDER BY</codeph> clause without an additional 
<codeph>LIMIT</codeph> clause is ignored in any
+        view definition. If you need to sort the entire result set from a 
view, use an <codeph>ORDER BY</codeph>
+        clause in the <codeph>SELECT</codeph> statement that queries the view. 
You can still make a simple <q>top
+        10</q> report by combining the <codeph>ORDER BY</codeph> and 
<codeph>LIMIT</codeph> clauses in the same
+        view definition:
+<codeblock xml:space="preserve">[localhost:21000] &gt; create table unsorted 
(x bigint);
+[localhost:21000] &gt; insert into unsorted values (1), (9), (3), (7), (5), 
(8), (4), (6), (2);
+[localhost:21000] &gt; create view sorted_view as select x from unsorted order 
by x;
+[localhost:21000] &gt; select x from sorted_view; -- ORDER BY clause in view 
has no effect.
++---+
+| x |
++---+
+| 1 |
+| 9 |
+| 3 |
+| 7 |
+| 5 |
+| 8 |
+| 4 |
+| 6 |
+| 2 |
++---+
+[localhost:21000] &gt; select x from sorted_view order by x; -- View query 
requires ORDER BY at outermost level.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
+| 7 |
+| 8 |
+| 9 |
++---+
+[localhost:21000] &gt; create view top_3_view as select x from unsorted order 
by x limit 3;
+[localhost:21000] &gt; select x from top_3_view; -- ORDER BY and LIMIT 
together in view definition are preserved.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+</codeblock>
+      </p>
+
+      <p id="precision_scale_example">
+        The following examples demonstrate how to check the precision and 
scale of numeric literals or other
+        numeric expressions. Impala represents numeric literals in the 
smallest appropriate type. 5 is a
+        <codeph>TINYINT</codeph> value, which ranges from -128 to 127, 
therefore 3 decimal digits are needed to
+        represent the entire range, and because it is an integer value there 
are no fractional digits. 1.333 is
+        interpreted as a <codeph>DECIMAL</codeph> value, with 4 digits total 
and 3 digits after the decimal point.
+<codeblock xml:space="preserve">[localhost:21000] &gt; select precision(5), 
scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3            | 0        |
++--------------+----------+
+[localhost:21000] &gt; select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4                | 3            |
++------------------+--------------+
+[localhost:21000] &gt; with t1 as
+  ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) 
x )
+  select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24           | 6        |
++--------------+----------+
+</codeblock>
+      </p>
+
+<!-- These 'type_' entries are for query options, where the type doesn't match 
up exactly with an Impala data type. -->
+
+      <p id="type_boolean">
+        <b>Type:</b> Boolean; recognized values are 1 and 0, or 
<codeph>true</codeph> and <codeph>false</codeph>;
+        any other value interpreted as <codeph>false</codeph>
+      </p>
+
+      <p id="type_string">
+        <b>Type:</b> string
+      </p>
+
+      <p id="type_integer">
+        <b>Type:</b> integer
+      </p>
+
+      <p id="default_false">
+        <b>Default:</b> <codeph>false</codeph>
+      </p>
+
+      <p id="default_false_0">
+        <b>Default:</b> <codeph>false</codeph> (shown as 0 in output of 
<codeph>SET</codeph> statement)
+      </p>
+
+      <p id="default_true_1">
+        <b>Default:</b> <codeph>true</codeph> (shown as 1 in output of 
<codeph>SET</codeph> statement)
+      </p>
+
+      <p id="odd_return_type_string">
+        Currently, the return value is always a <codeph>STRING</codeph>. The 
return type is subject to change in
+        future releases. Always use <codeph>CAST()</codeph> to convert the 
result to whichever data type is
+        appropriate for your computations.
+      </p>
+
+      <p rev="2.0.0" id="former_odd_return_type_string">
+        <b>Return type:</b> <codeph>DOUBLE</codeph> in Impala 2.0 and higher; 
<codeph>STRING</codeph> in earlier
+        releases
+      </p>
+
+      <p id="for_compatibility_only">
+        <b>Usage notes:</b> Primarily for compatibility with code containing 
industry extensions to SQL.
+      </p>
+
+      <p id="return_type_boolean">
+        <b>Return type:</b> <codeph>BOOLEAN</codeph>
+      </p>
+
+      <p id="return_type_double">
+        <b>Return type:</b> <codeph>DOUBLE</codeph>
+      </p>
+
+      <p id="return_type_same">
+        <b>Return type:</b> Same as the input value
+      </p>
+
+      <p id="return_type_same_except_string">
+        <b>Return type:</b> Same as the input value, except for 
<codeph>CHAR</codeph> and <codeph>VARCHAR</codeph>
+        arguments which produce a <codeph>STRING</codeph> result
+      </p>
+
+      <p id="builtins_db">
+        Impala includes another predefined database, 
<codeph>_impala_builtins</codeph>, that serves as the location
+        for the <xref href="../topics/impala_functions.xml#builtins">built-in 
functions</xref>. To see the built-in
+        functions, use a statement like the following:
+<codeblock xml:space="preserve">show functions in _impala_builtins;
+show functions in _impala_builtins like '*<varname>substring</varname>*';
+</codeblock>
+      </p>
+
+      <p id="sum_double">
+        Due to the way arithmetic on <codeph>FLOAT</codeph> and 
<codeph>DOUBLE</codeph> columns uses
+        high-performance hardware instructions, and distributed queries can 
perform these operations in different
+        order for each query, results can vary slightly for aggregate function 
calls such as <codeph>SUM()</codeph>
+        and <codeph>AVG()</codeph> for <codeph>FLOAT</codeph> and 
<codeph>DOUBLE</codeph> columns, particularly on
+        large data sets where millions or billions of values are summed or 
averaged. For perfect consistency and
+        repeatability, use the <codeph>DECIMAL</codeph> data type for such 
operations instead of
+        <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>.
+      </p>
+
+      <p id="float_double_decimal_caveat">
+        The inability to exactly represent certain floating-point values means 
that
+        <codeph>DECIMAL</codeph> is sometimes a better choice than 
<codeph>DOUBLE</codeph>
+        or <codeph>FLOAT</codeph> when precision is critical, particularly when
+        transferring data from other database systems that use different 
representations
+        or file formats.
+      </p>
+
+      <p rev="1.4.0" id="decimal_no_stats">
+        Currently, the <codeph>COMPUTE STATS</codeph> statement under CDH 4 
does not store any statistics for
+        <codeph>DECIMAL</codeph> columns. When Impala runs under CDH 5, which 
has better support for
+        <codeph>DECIMAL</codeph> in the metastore database, <codeph>COMPUTE 
STATS</codeph> does collect statistics
+        for <codeph>DECIMAL</codeph> columns and Impala uses the statistics to 
optimize query performance.
+      </p>
+
+      <p rev="CDH-35866" id="hive_column_stats_caveat">
+        If you run the Hive statement <codeph>ANALYZE TABLE COMPUTE STATISTICS 
FOR COLUMNS</codeph>, 
+        Impala can only use the resulting column statistics if the table is 
unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned 
table.
+      </p>
+
+      <p id="datetime_function_chaining">
+        <codeph>unix_timestamp()</codeph> and <codeph>from_unixtime()</codeph> 
are often used in combination to
+        convert a <codeph>TIMESTAMP</codeph> value into a particular string 
format. For example:
+<codeblock xml:space="preserve">select from_unixtime(unix_timestamp(now() + 
interval 3 days),
+  'yyyy/MM/dd HH:mm') as yyyy_mm_dd_hh_mm;
++------------------+
+| yyyy_mm_dd_hh_mm |
++------------------+
+| 2016/06/03 11:38 |
++------------------+
+</codeblock>
+      </p>
+
+      <p rev="1.4.0 obwl" id="insert_sort_blurb">
+        <b>Sorting considerations:</b> Although you can specify an 
<codeph>ORDER BY</codeph> clause in an
+        <codeph>INSERT ... SELECT</codeph> statement, any <codeph>ORDER 
BY</codeph> clause is ignored and the
+        results are not necessarily sorted. An <codeph>INSERT ... 
SELECT</codeph> operation potentially creates
+        many different data files, prepared on different data nodes, and 
therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+      <p rev="1.4.0" id="create_table_like_view">
+        Prior to Impala 1.4.0, it was not possible to use the <codeph>CREATE 
TABLE LIKE
+        <varname>view_name</varname></codeph> syntax. In Impala 1.4.0 and 
higher, you can create a table with the
+        same column definitions as a view using the <codeph>CREATE TABLE 
LIKE</codeph> technique. Although
+        <codeph>CREATE TABLE LIKE</codeph> normally inherits the file format 
of the original table, a view has no
+        underlying file format, so <codeph>CREATE TABLE LIKE 
<varname>view_name</varname></codeph> produces a text
+        table by default. To specify a different file format, include a 
<codeph>STORED AS
+        <varname>file_format</varname></codeph> clause at the end of the 
<codeph>CREATE TABLE LIKE</codeph>
+        statement.
+      </p>
+
+      <note rev="1.4.0" id="compute_stats_nulls"> Prior to Impala 1.4.0,
+          <codeph>COMPUTE STATS</codeph> counted the number of
+          <codeph>NULL</codeph> values in each column and recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <codeph>NULL</codeph> count during query planning, Impala 1.4.0 and
+        higher speeds up the <codeph>COMPUTE STATS</codeph> statement by
+        skipping this <codeph>NULL</codeph> counting. </note>
+
+      <p id="regular_expression_whole_string">
+        The regular expression must match the entire value, not just occur 
somewhere inside it. Use <codeph>.*</codeph> at the beginning,
+        the end, or both if you only need to match characters anywhere in the 
middle. Thus, the <codeph>^</codeph> and <codeph>$</codeph>
+        atoms are often redundant, although you might already have them in 
your expression strings that you reuse from elsewhere.
+      </p>
+
+      <p rev="1.3.1" id="regexp_matching">
+        In Impala 1.3.1 and higher, the <codeph>REGEXP</codeph> and 
<codeph>RLIKE</codeph> operators now match a
+        regular expression string that occurs anywhere inside the target 
string, the same as if the regular
+        expression was enclosed on each side by <codeph>.*</codeph>. See
+        <xref href="../topics/impala_operators.xml#regexp"/> for examples. 
Previously, these operators only
+        succeeded when the regular expression matched the entire target 
string. This change improves compatibility
+        with the regular expression support for popular database systems. 
There is no change to the behavior of the
+        <codeph>regexp_extract()</codeph> and 
<codeph>regexp_replace()</codeph> built-in functions.
+      </p>
+
+      <p rev="1.3.1" id="insert_inherit_permissions">
+        By default, if an <codeph>INSERT</codeph> statement creates any new 
subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for 
the <codeph>impala</codeph> user. To
+        make each subdirectory have the same permissions as its parent 
directory in HDFS, specify the
+        <codeph>--insert_inherit_permissions</codeph> startup option for the 
<cmdname>impalad</cmdname> daemon.
+      </p>
+
+      <note id="multiple_count_distinct">
+        <p>
+          By default, Impala only allows a single <codeph>COUNT(DISTINCT 
<varname>columns</varname>)</codeph>
+          expression in each query.
+        </p>
+        <p>
+          If you do not need precise accuracy, you can produce an estimate of 
the distinct values for a column by
+          specifying <codeph>NDV(<varname>column</varname>)</codeph>; a query 
can contain multiple instances of
+          <codeph>NDV(<varname>column</varname>)</codeph>. To make Impala 
automatically rewrite
+          <codeph>COUNT(DISTINCT)</codeph> expressions to 
<codeph>NDV()</codeph>, enable the
+          <codeph>APPX_COUNT_DISTINCT</codeph> query option.
+        </p>
+        <p>
+          To produce the same result as multiple 
<codeph>COUNT(DISTINCT)</codeph> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<codeblock xml:space="preserve">select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1 
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</codeblock>
+        <p>
+          Because <codeph>CROSS JOIN</codeph> is an expensive operation, 
prefer to use the <codeph>NDV()</codeph>
+          technique wherever practical.
+        </p>
+      </note>
+
+      <p>
+        <ph id="union_all_vs_union">Prefer <codeph>UNION ALL</codeph> over 
<codeph>UNION</codeph> when you know the
+        data sets are disjoint or duplicate values are not a problem; 
<codeph>UNION ALL</codeph> is more efficient
+        because it avoids materializing and sorting the entire result set to 
eliminate duplicate values.</ph>
+      </p>
+
+      <note id="thorn">
+        The <codeph>CREATE TABLE</codeph> clauses <codeph>FIELDS TERMINATED 
BY</codeph>, <codeph>ESCAPED
+        BY</codeph>, and <codeph>LINES TERMINATED BY</codeph> have special 
rules for the string literal used for
+        their argument, because they all require a single character. You can 
use a regular character surrounded by
+        single or double quotation marks, an octal sequence such as 
<codeph>'\054'</codeph> (representing a comma),
+        or an integer in the range '-127'..'128' (with quotation marks but no 
backslash), which is interpreted as a
+        single-byte ASCII character. Negative values are subtracted from 256; 
for example, <codeph>FIELDS
+        TERMINATED BY '-2'</codeph> sets the field delimiter to ASCII code 
254, the <q>Icelandic Thorn</q>
+        character used as a delimiter by some data formats.
+      </note>
+
+      <p id="sqoop_blurb">
+        <b>Sqoop considerations:</b>
+      </p>
+
+      <p id="sqoop_timestamp_caveat" rev="IMPALA-2111 CDH-37399"> If you use 
Sqoop to
+        convert RDBMS data to Parquet, be careful with interpreting any
+        resulting values from <codeph>DATE</codeph>, <codeph>DATETIME</codeph>,
+        or <codeph>TIMESTAMP</codeph> columns. The underlying values are
+        represented as the Parquet <codeph>INT64</codeph> type, which is
+        represented as <codeph>BIGINT</codeph> in the Impala table. The Parquet
+        values represent the time in milliseconds, while Impala interprets
+          <codeph>BIGINT</codeph> as the time in seconds. Therefore, if you 
have
+        a <codeph>BIGINT</codeph> column in a Parquet table that was imported
+        this way from Sqoop, divide the values by 1000 when interpreting as the
+          <codeph>TIMESTAMP</codeph> type.</p>
+
+      <p id="command_line_blurb">
+        <b>Command-line equivalent:</b>
+      </p>
+
+      <p rev="2.3.0" id="complex_types_blurb">
+        <b>Complex type considerations:</b>
+      </p>
+
+      <p id="complex_types_combo">
+        Because complex types are often used in combination,
+        for example an <codeph>ARRAY</codeph> of <codeph>STRUCT</codeph>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <xref 
href="../topics/impala_complex_types.xml#complex_types"/> for
+        background information and usage examples.
+      </p>
+
+      <ul id="complex_types_restrictions">
+        <li>
+          <p>
+            Columns with this da


<TRUNCATED>

[48/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to