[11/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:39 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_proxy.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_proxy.xml b/docs/topics/impala_proxy.xml
new file mode 100644
index 0000000..fc2e27c
--- /dev/null
+++ b/docs/topics/impala_proxy.xml
@@ -0,0 +1,635 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="proxy">
+
+  <title>Using Impala through a Proxy for High Availability</title>
+  <titlealts audience="PDF"><navtitle>Load-Balancing Proxy for 
HA</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="High Availability"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Network"/>
+      <data name="Category" value="Proxy"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      For most clusters that have multiple users and production availability 
requirements, you might set up a proxy
+      server to relay requests to and from Impala.
+    </p>
+
+    <p>
+      Currently, the Impala statestore mechanism does not include such 
proxying and load-balancing features. Set up
+      a software package of your choice to perform these functions.
+    </p>
+
+    <note>
+      <p 
conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
+    </note>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+  <concept id="proxy_overview">
+
+    <title>Overview of Proxy Usage and Load Balancing for Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Concepts"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        Using a load-balancing proxy server for Impala has the following 
advantages:
+      </p>
+
+      <ul>
+        <li>
+          Applications connect to a single well-known host and port, rather 
than keeping track of the hosts where
+          the <cmdname>impalad</cmdname> daemon is running.
+        </li>
+
+        <li>
+          If any host running the <cmdname>impalad</cmdname> daemon becomes 
unavailable, application connection
+          requests still succeed because you always connect to the proxy 
server rather than a specific host running
+          the <cmdname>impalad</cmdname> daemon.
+        </li>
+
+        <li>
+          The coordinator node for each Impala query potentially requires more 
memory and CPU cycles than the other
+          nodes that process the query. The proxy server can issue queries 
using round-robin scheduling, so that
+          each connection uses a different coordinator node. This 
load-balancing technique lets the Impala nodes
+          share this additional work, rather than concentrating it on a single 
machine.
+        </li>
+      </ul>
+
+      <p>
+        The following setup steps are a general outline that apply to any 
load-balancing proxy software:
+      </p>
+
+      <ol>
+        <li>
+          Download the load-balancing proxy software. It should only need to 
be installed and configured on a
+          single host. Pick a host other than the DataNodes where 
<cmdname>impalad</cmdname> is running,
+          because the intention is to protect against the possibility of one 
or more of these DataNodes becoming unavailable.
+        </li>
+
+        <li>
+          Configure the load balancer (typically by editing a configuration 
file).
+          In particular:
+          <ul>
+            <li>
+              <p>
+                Set up a port that the load balancer will listen on to relay 
Impala requests back and forth.
+              </p>
+            </li>
+            <li>
+              <p rev="DOCS-690">
+                Consider enabling <q>sticky sessions</q>. <ph 
rev="upstream">Cloudera</ph> recommends enabling this setting
+                so that stateless client applications such as 
<cmdname>impalad</cmdname> and Hue
+                are not disconnected from long-running queries. Evaluate 
whether this setting is
+                appropriate for your combination of workload and client 
applications.
+              </p>
+            </li>
+            <li>
+              <p>
+                For Kerberized clusters, follow the instructions in <xref 
href="impala_proxy.xml#proxy_kerberos"/>.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li>
+          Specify the host and port settings for each Impala node. These are 
the hosts that the load balancer will
+          choose from when relaying each Impala query. See <xref 
href="impala_ports.xml#ports"/> for when to use
+          port 21000, 21050, or another value depending on what type of 
connections you are load balancing.
+          <note rev="CDH-30399">
+            <p rev="CDH-30399">
+              In particular, if you are using Hue or JDBC-based applications,
+              you typically set up load balancing for both ports 21000 and 
21050, because
+              these client applications connect through port 21050 while the 
<cmdname>impala-shell</cmdname>
+              command connects through port 21000.
+            </p>
+          </note>
+        </li>
+
+        <li>
+          Run the load-balancing proxy server, pointing it at the 
configuration file that you set up.
+        </li>
+
+        <li>
+          On systems managed by Cloudera Manager, on the page
+          
<menucascade><uicontrol>Impala</uicontrol><uicontrol>Configuration</uicontrol><uicontrol>Impala
 Daemon
+          Default Group</uicontrol></menucascade>, specify a value for the 
<uicontrol>Impala Daemons Load
+          Balancer</uicontrol> field. Specify the address of the load balancer 
in
+          <codeph><varname>host</varname>:<varname>port</varname></codeph> 
format. This setting lets Cloudera
+          Manager route all appropriate Impala-related operations through the 
proxy server.
+        </li>
+
+        <li>
+          For any scripts, jobs, or configuration settings for applications 
that formerly connected to a specific
+          datanode to run Impala SQL statements, change the connection 
information (such as the <codeph>-i</codeph>
+          option in <cmdname>impala-shell</cmdname>) to point to the load 
balancer instead.
+        </li>
+      </ol>
+
+      <note>
+        The following sections use the HAProxy software as a representative 
example of a load balancer
+        that you can use with Impala.
+        For information specifically about using Impala with the F5 BIG-IP 
load balancer, see
+        <xref 
href="http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf";
 scope="external" format="html">Impala HA with F5 BIG-IP</xref>.
+      </note>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="proxy_balancing" rev="CDH-33836 DOCS-349 CDH-39925 CDH-36812" 
audience="Cloudera">
+    <title>Choosing the Load-Balancing Algorithm</title>
+    <conbody>
+      <p>
+        Load-balancing software offers a number of algorithms to distribute 
requests.
+        Each algorithm has its own characteristics that make it suitable in 
some situations
+        but not others.
+      </p>
+
+      <dl>
+        <dlentry>
+          <dt>leastconn</dt>
+          <dd>
+            Connects sessions to the coordinator with the fewest connections, 
to balance the load evenly.
+            Typically used for workloads consisting of many independent, 
short-running queries.
+            In configurations with only a few client machines, this setting 
can avoid having all
+            requests go to only a small set of coordinators.
+          </dd>
+        </dlentry>
+        <dlentry>
+          <dt>source affinity</dt>
+          <dd>
+            Sessions from the same IP address always go to the same 
coordinator.
+            A good choice for Impala workloads containing a mix of queries and
+            DDL statements, such as <codeph>CREATE TABLE</codeph> and 
<codeph>ALTER TABLE</codeph>.
+            Because the metadata changes from a DDL statement take time to 
propagate across the cluster,
+            prefer to use source affinity in this case. If necessary, run the 
DDL and subsequent
+            queries that depend on the results of the DDL through the same 
session, for example
+            by running <codeph>impala-shell -f 
<varname>script_file</varname></codeph> to submit
+            several statements through a single session.
+            An alternative is to set the query option 
<codeph>SYNC_DDL=1</codeph>
+            to hold back subsequent queries until the results of a DDL 
operation have propagated
+            throughout the cluster, but that is a relatively expensive setting.
+            Recommended for use with Hue.
+          </dd>
+        </dlentry>
+        <dlentry>
+          <dt>sticky</dt>
+          <dd>
+            Similar to source affinity. Sessions from the same IP address 
always go to the same coordinator.
+            The maintenance overhead for the <q>stick tables</q> can cause 
long-running Hue sessions
+            to disconnect, therefore source affinity is often a better choice.
+          </dd>
+        </dlentry>
+        <dlentry>
+          <dt>round-robin</dt>
+          <dd>
+            Distributes connections to all coordinator nodes.
+            Typically not recommended for Impala.
+          </dd>
+        </dlentry>
+      </dl>
+
+      <p>
+        You might need to perform benchmarks and load testing to determine 
which setting is optimal for your
+        use case. If some client applications have special characteristics, 
such as long-running Hue queries
+        working best with source affinity, you might configure multiple 
virtual IP addresses with a
+        different load-balancing algorithm for each.
+      </p>
+
+    </conbody>
+  </concept>
+
+  <concept id="proxy_kerberos">
+
+    <title>Special Proxy Considerations for Clusters Using Kerberos</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Security"/>
+      <data name="Category" value="Kerberos"/>
+      <data name="Category" value="Authentication"/>
+      <data name="Category" value="Proxy"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        In a cluster using Kerberos, applications check host credentials to 
verify that the host they are
+        connecting to is the same one that is actually processing the request, 
to prevent man-in-the-middle
+        attacks. To clarify that the load-balancing proxy server is 
legitimate, perform these extra Kerberos setup
+        steps:
+      </p>
+
+      <ol>
+        <li>
+          This section assumes you are starting with a Kerberos-enabled 
cluster. See
+          <xref href="impala_kerberos.xml#kerberos"/> for instructions for 
setting up Impala with Kerberos. See the
+          <cite>CDH Security Guide</cite> for
+          <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_kerberos_prin_keytab_deploy.html";
 scope="external" format="html">general steps to set up Kerberos</xref>.
+        </li>
+
+        <li>
+          Choose the host you will use for the proxy server. Based on the 
Kerberos setup procedure, it should
+          already have an entry 
<codeph>impala/<varname>proxy_host</varname>@<varname>realm</varname></codeph> 
in
+          its keytab. If not, go back over the initial Kerberos configuration 
steps for the keytab on each host
+          running the <cmdname>impalad</cmdname> daemon.
+        </li>
+
+        <li rev="CDH-40363">
+          For a cluster managed by Cloudera Manager (5.4.2 or higher), fill in 
the Impala configuration setting
+          <uicontrol>Impala Daemons Load Balancer</uicontrol> with the 
appropriate host:port combination.
+          Then restart the Impala service.
+          For systems using a recent level of Cloudera Manager, this is all 
the configuration you need; you can skip the remaining steps in this procedure.
+        </li>
+
+        <li>
+          On systems not managed by Cloudera Manager, or systems using 
Cloudera Manager earlier than 5.4.2:
+
+        <ol>
+          <li>
+            Copy the keytab file from the proxy host to all other hosts in the 
cluster that run the
+            <cmdname>impalad</cmdname> daemon. (For optimal performance, 
<cmdname>impalad</cmdname> should be running
+            on all DataNodes in the cluster.) Put the keytab file in a secure 
location on each of these other hosts.
+          </li>
+
+          <li>
+            Add an entry 
<codeph>impala/<varname>actual_hostname</varname>@<varname>realm</varname></codeph>
 to the keytab on each
+            host running the <cmdname>impalad</cmdname> daemon.
+          </li>
+
+          <li>
+            For each impalad node, merge the existing keytab with the 
proxyâs keytab using
+            <cmdname>ktutil</cmdname>, producing a new keytab file. For 
example:
+  <codeblock>$ ktutil
+  ktutil: read_kt proxy.keytab
+  ktutil: read_kt impala.keytab
+  ktutil: write_kt proxy_impala.keytab
+  ktutil: quit</codeblock>
+            <note>
+              On systems managed by Cloudera Manager 5.1.0 and later, the 
keytab merging happens automatically. To
+              verify that Cloudera Manager has merged the keytabs, run the 
command:
+  <codeblock>klist -k <varname>keytabfile</varname></codeblock>
+              which lists the credentials for both <codeph>principal</codeph> 
and <codeph>be_principal</codeph> on
+              all nodes.
+            </note>
+          </li>
+
+          <li>
+            Make sure that the <codeph>impala</codeph> user has permission to 
read this merged keytab file.
+          </li>
+
+          <li>
+            Change some configuration settings for each host in the cluster 
that participates in the load balancing.
+            Follow the appropriate steps depending on whether you use Cloudera 
Manager or not:
+            <ul>
+              <li> In the <cmdname>impalad</cmdname> option definition, or the 
advanced
+                configuration snippet, add: 
<codeblock>--principal=impala/<varname>proxy_host</varname>@<varname>realm</varname>
+  --be_principal=impala/<varname>actual_host</varname>@<varname>realm</varname>
+  --keytab_file=<varname>path_to_merged_keytab</varname></codeblock>
+                <note>
+                  <p>On a cluster managed by Cloudera Manager 5.1 (or higher),
+                    when you set up Kerberos authentication using the wizard, 
you
+                    can choose to allow Cloudera Manager to deploy the
+                      <systemoutput>krb5.conf</systemoutput> on your cluster. 
In
+                    such a case, you do not need to explicitly modify safety 
valve
+                    parameters as directed above. </p>
+                  <p>Every host has a different <codeph>--be_principal</codeph>
+                    because the actual hostname is different on each host. </p>
+                  <p> Specify the fully qualified domain name (FQDN) for the 
proxy
+                    host, not the IP address. Use the exact FQDN as returned 
by a
+                    reverse DNS lookup for the associated IP address. </p>
+                </note>
+              </li>
+
+              <li>
+                On a cluster managed by Cloudera Manager, create a role group 
to set the configuration values from
+                the preceding step on a per-host basis.
+              </li>
+
+              <li>
+                On a cluster not managed by Cloudera Manager, see
+                <xref href="impala_config_options.xml#config_options"/> for 
the procedure to modify the startup
+                options.
+              </li>
+            </ul>
+          </li>
+
+          <li>
+            Restart Impala to make the changes take effect. Follow the 
appropriate steps depending on whether you use
+            Cloudera Manager or not:
+            <ul>
+              <li>
+                On a cluster managed by Cloudera Manager, restart the Impala 
service.
+              </li>
+
+              <li>
+                On a cluster not managed by Cloudera Manager, restart the 
<cmdname>impalad</cmdname> daemons on all
+                hosts in the cluster, as well as the 
<cmdname>statestored</cmdname> and <cmdname>catalogd</cmdname>
+                daemons.
+              </li>
+            </ul>
+          </li>
+        </ol>
+        </li>
+      </ol>
+
+<!--
+We basically want to merge the keytab from the proxy host to all the impalad 
host's keytab file. To merge two keytab files, we first need to ship the proxy 
keytab to all the impalad node, then merge keytab files using MIT Kerberos 
"ktutil" command line tool.
+
+<codeblock>$ ktutil
+ktutil: read_kt krb5.keytab
+ktutil: read_kt proxy-host.keytab
+ktutil: write_kt krb5.keytab
+ktutil: quit</codeblock>
+
+The setup of the -principal and -be_principal has to be set through safety 
valve.
+-->
+
+    </conbody>
+
+  </concept>
+
+  <concept id="tut_proxy">
+
+    <title>Example of Configuring HAProxy Load Balancer for Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Configuring"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        If you are not already using a load-balancing proxy, you can 
experiment with
+        <xref href="http://haproxy.1wt.eu/"; scope="external" 
format="html">HAProxy</xref> a free, open source load
+        balancer. This example shows how you might install and configure that 
load balancer on a Red Hat Enterprise
+        Linux system.
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Install the load balancer: <codeph>yum install haproxy</codeph>
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Set up the configuration file: 
<filepath>/etc/haproxy/haproxy.cfg</filepath>. See the following section
+            for a sample configuration file.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Run the load balancer (on a single host, preferably one not 
running <cmdname>impalad</cmdname>):
+          </p>
+<codeblock>/usr/sbin/haproxy âf /etc/haproxy/haproxy.cfg</codeblock>
+        </li>
+
+        <li>
+          <p>
+            In <cmdname>impala-shell</cmdname>, JDBC applications, or ODBC 
applications, connect to the listener
+            port of the proxy host, rather than port 21000 or 21050 on a host 
actually running <cmdname>impalad</cmdname>.
+            The sample configuration file sets haproxy to listen on port 
25003, therefore you would send all
+            requests to <codeph><varname>haproxy_host</varname>:25003</codeph>.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        This is the sample <filepath>haproxy.cfg</filepath> used in this 
example:
+      </p>
+
+<codeblock>global
+    # To have these messages end up in /var/log/haproxy.log you will
+    # need to:
+    #
+    # 1) configure syslog to accept network log events.  This is done
+    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
+    #    /etc/sysconfig/syslog
+    #
+    # 2) configure local2 events to go to the /var/log/haproxy.log
+    #   file. A line like the following can be added to
+    #   /etc/sysconfig/syslog
+    #
+    #    local2.*                       /var/log/haproxy.log
+    #
+    log         127.0.0.1 local0
+    log         127.0.0.1 local1 notice
+    chroot      /var/lib/haproxy
+    pidfile     /var/run/haproxy.pid
+    maxconn     4000
+    user        haproxy
+    group       haproxy
+    daemon
+
+    # turn on stats unix socket
+    #stats socket /var/lib/haproxy/stats
+
+#---------------------------------------------------------------------
+# common defaults that all the 'listen' and 'backend' sections will
+# use if not designated in their block
+#
+# You might need to adjust timing values to prevent timeouts.
+#---------------------------------------------------------------------
+defaults
+    mode                    http
+    log                     global
+    option                  httplog
+    option                  dontlognull
+    option http-server-close
+    option forwardfor       except 127.0.0.0/8
+    option                  redispatch
+    retries                 3
+    maxconn                 3000
+    contimeout 5000
+    clitimeout 50000
+    srvtimeout 50000
+
+#
+# This sets up the admin page for HA Proxy at port 25002.
+#
+listen stats :25002
+    balance
+    mode http
+    stats enable
+    stats auth <varname>username</varname>:<varname>password</varname>
+
+# This is the setup for Impala. Impala client connect to 
load_balancer_host:25003.
+# HAProxy will balance connections among the list of servers listed below.
+# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or 
original ODBC driver.
+# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
+listen impala :25003
+    mode tcp
+    option tcplog
+    balance leastconn
+
+    server <varname>symbolic_name_1</varname> impala-host-1.example.com:21000
+    server <varname>symbolic_name_2</varname> impala-host-2.example.com:21000
+    server <varname>symbolic_name_3</varname> impala-host-3.example.com:21000
+    server <varname>symbolic_name_4</varname> impala-host-4.example.com:21000
+
+# Setup for Hue or other JDBC-enabled applications.
+# In particular, Hue requires sticky sessions.
+# The application connects to load_balancer_host:21051, and HAProxy balances
+# connections to the associated hosts, where Impala listens for JDBC
+# requests on port 21050.
+listen impalajdbc :21051
+    mode tcp
+    option tcplog
+    balance source
+    server <varname>symbolic_name_5</varname> impala-host-1.example.com:21050
+    server <varname>symbolic_name_6</varname> impala-host-2.example.com:21050
+    server <varname>symbolic_name_7</varname> impala-host-3.example.com:21050
+    server <varname>symbolic_name_8</varname> impala-host-4.example.com:21050
+</codeblock>
+
+      <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
+
+      <p audience="Cloudera">
+        The following example shows extra steps needed for a cluster using 
Kerberos authentication:
+      </p>
+
+<codeblock audience="Cloudera">$ klist
+$ impala-shell -k
+$ kinit -r 1d -kt /systest/keytabs/hdfs.keytab hdfs
+$ impala-shell -i c2104.hal.cloudera.com:21000
+$ impala-shell -i c2104.hal.cloudera.com:25003
+[root@c2104 alan]# ps -ef |grep impalad
+root      6442  6428  0 12:21 pts/0    00:00:00 grep impalad
+impala   30577 22192 99 Nov14 ?        3-16:42:32 
/usr/lib/impala/sbin-debug/impalad 
--flagfile=/var/run/cloudera-scm-agent/process/10342-impala-IMPALAD/impala-conf/impalad_flags
+[root@c2104 alan]# vi 
/var/run/cloudera-scm-agent/process/10342-impala-IMPALAD/impala-conf/impalad_flags
+$ klist -k 
/var/run/cloudera-scm-agent/process/10342-impala-IMPALAD/impala.keytab
+Keytab name: 
FILE:/var/run/cloudera-scm-agent/process/10342-impala-IMPALAD/impala.keytab
+KVNO Principal
+---- --------------------------------------------------------------------------
+   2 impala/[email protected]
+   2 impala/[email protected]
+   2 impala/[email protected]
+   2 impala/[email protected]
+   2 HTTP/[email protected]
+   2 HTTP/[email protected]
+   2 HTTP/[email protected]
+   2 HTTP/[email protected]
+$ klist
+Ticket cache: FILE:/tmp/krb5cc_4028
+Default principal: [email protected]
+
+Valid starting     Expires            Service principal
+11/15/13 12:17:17  11/15/13 12:32:17  
krbtgt/[email protected]
+        renew until 11/16/13 12:17:17
+11/15/13 12:17:21  11/15/13 12:32:17  
impala/[email protected]
+        renew until 11/16/13 12:17:17
+$ kinit -r 1d -kt /systest/keytabs/hdfs.keytab hdfs
+$ kinit -R
+$ impala-shell -k -i c2106.hal.cloudera.com:21000
+Starting Impala Shell using Kerberos authentication
+Using service name 'impala'
+Connected to c2106.hal.cloudera.com:21000
+$ impala-shell -i c2104.hal.cloudera.com:25003
+$ impala-shell -k -i c2104.hal.cloudera.com:25003
+Starting Impala Shell using Kerberos authentication
+Using service name 'impala'
+Connected to c2104.hal.cloudera.com:25003
+[c2104.hal.cloudera.com:25003] &gt; create table alan_tmp(a int);
+Query: create table alan_tmp(a int)
+ERROR: InternalException: Got exception: org.apache.hadoop.ipc.RemoteException 
User: hive/[email protected] is not allowed to 
impersonate impala/[email protected]
+$ kdestroy
+$ kinit -r 1d -kt /systest/keytabs/hdfs.keytab hdfs
+$ impala-shell -k -i c2104.hal.cloudera.com:25003
+# klist -k c2104.keytab
+Keytab name: FILE:c2104.keytab
+KVNO Principal
+---- --------------------------------------------------------------------------
+   2 impala/[email protected]
+   2 impala/[email protected]
+   2 impala/[email protected]
+   2 impala/[email protected]
+   2 HTTP/[email protected]
+   2 HTTP/[email protected]
+   2 HTTP/[email protected]
+   2 HTTP/[email protected]
+$ klist -k -t c2106.keytab
+Keytab name: FILE:c2106.keytab
+KVNO Timestamp         Principal
+---- ----------------- --------------------------------------------------------
+   2 02/14/13 12:12:22 HTTP/[email protected]
+   2 02/14/13 12:12:22 HTTP/[email protected]
+   2 02/14/13 12:12:22 HTTP/[email protected]
+   2 02/14/13 12:12:22 HTTP/[email protected]
+   2 02/14/13 12:12:22 impala/[email protected]
+   2 02/14/13 12:12:22 impala/[email protected]
+   2 02/14/13 12:12:22 impala/[email protected]
+   2 02/14/13 12:12:22 impala/[email protected]
+$ ktutil
+ktutil:  rkt c2104.keytab
+ktutil:  rkt c2106.keytab
+ktutil:  wkt my_test.keytab
+ktutil:  q
+$ klist -k -t my_test.keytab
+Keytab name: FILE:my_test.keytab
+KVNO Timestamp         Principal
+---- ----------------- --------------------------------------------------------
+   2 11/21/13 16:22:40 impala/[email protected]
+   2 11/21/13 16:22:40 impala/[email protected]
+   2 11/21/13 16:22:40 impala/[email protected]
+   2 11/21/13 16:22:40 impala/[email protected]
+   2 11/21/13 16:22:40 HTTP/[email protected]
+   2 11/21/13 16:22:40 HTTP/[email protected]
+   2 11/21/13 16:22:40 HTTP/[email protected]
+   2 11/21/13 16:22:40 HTTP/[email protected]
+   2 11/21/13 16:22:40 HTTP/[email protected]
+   2 11/21/13 16:22:41 HTTP/[email protected]
+   2 11/21/13 16:22:41 HTTP/[email protected]
+   2 11/21/13 16:22:41 HTTP/[email protected]
+   2 11/21/13 16:22:41 impala/[email protected]
+   2 11/21/13 16:22:41 impala/[email protected]
+   2 11/21/13 16:22:41 impala/[email protected]
+   2 11/21/13 16:22:41 impala/[email protected]
+$ kdestroy
+$ kinit -r 1d -kt /systest/keytabs/hdfs.keytab hdfs
+$ vi README
+$ kinit -R
+$ impala-shell -k -i c2104.hal.cloudera.com:25003
+Starting Impala Shell using Kerberos authentication
+Using service name 'impala'
+Connected to c2104.hal.cloudera.com:25003
+<ph conref="../shared/ImpalaVariables.xml#impala_vars/ImpaladBanner"/>
+Welcome to the Impala shell. Press TAB twice to see a list of available 
commands.
+
+Copyright (c) 2012 Cloudera, Inc. All rights reserved.
+
+<ph conref="../shared/ImpalaVariables.xml#impala_vars/ShellBanner"/>
+[c2104.hal.cloudera.com:25003] &gt; show tables;
+Query: show tables
+ERROR: AnalysisException: This Impala daemon is not ready to accept user 
requests. Status: Waiting for catalog update from the StateStore.
+[c2104.hal.cloudera.com:25003] &gt; quit;</codeblock>
+
+      <!--
+        At that point in the walkthrough with Alan Choi, we could never get 
Impala to accept any requests through the catalog server.
+        So I have not seen a 100% successful proxy setup process to verify all 
the details.
+      -->
+
+    </conbody>
+
+  </concept>
+
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_query_lifetime.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_lifetime.xml 
b/docs/topics/impala_query_lifetime.xml
new file mode 100644
index 0000000..2f46d21
--- /dev/null
+++ b/docs/topics/impala_query_lifetime.xml
@@ -0,0 +1,31 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="query_lifetime">
+
+  <title>Impala Query Lifetime</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Concepts"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Impala queries progress through a series of stages from the time they 
are initiated to the time
+      they are completed. A query can also be cancelled before it is entirely 
finished, either
+      because of an explicit cancellation, or because of a timeout, 
out-of-memory, or other error condition.
+      Understanding the query lifecycle can help you manage the throughput and 
resource usage of Impala
+      queries, especially in a high-concurrency or multi-workload environment.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_query_options.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_options.xml 
b/docs/topics/impala_query_options.xml
new file mode 100644
index 0000000..7ed6ac2
--- /dev/null
+++ b/docs/topics/impala_query_options.xml
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="query_options">
+
+  <title>Query Options for the SET Statement</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      You can specify the following options using the <codeph>SET</codeph> 
statement, and those settings affect all
+      queries issued from that session.
+    </p>
+
+    <p>
+      Some query options are useful in day-to-day operations for improving 
usability, performance, or flexibility.
+    </p>
+
+    <p>
+      Other query options control special-purpose aspects of Impala operation 
and are intended primarily for
+      advanced debugging or troubleshooting.
+    </p>
+
+    <p>
+      Options with Boolean parameters can be set to 1 or <codeph>true</codeph> 
to enable, or 0 or <codeph>false</codeph>
+      to turn off.
+    </p>
+
+    <note rev="2.0.0">
+      <p rev="2.0.0">
+        In Impala 2.0 and later, you can set query options directly through 
the JDBC and ODBC interfaces by using the
+        <codeph>SET</codeph> statement. Formerly, <codeph>SET</codeph> was 
only available as a command within the
+        <cmdname>impala-shell</cmdname> interpreter.
+      </p>
+    </note>
+
+<!-- This is the list including defaults from the pre-release 1.2 impala-shell:
+        ABORT_ON_DEFAULT_LIMIT_EXCEEDED: 0
+        ABORT_ON_ERROR: 0
+        ALLOW_UNSUPPORTED_FORMATS: 0
+        BATCH_SIZE: 0
+        DEBUG_ACTION:
+        DEFAULT_ORDER_BY_LIMIT: -1
+        DISABLE_CODEGEN: 0
+        HBASE_CACHE_BLOCKS: 0
+        HBASE_CACHING: 0
+        MAX_ERRORS: 0
+        MAX_IO_BUFFERS: 0
+        MAX_SCAN_RANGE_LENGTH: 0
+        MEM_LIMIT: 0
+        NUM_NODES: 0
+        NUM_SCANNER_THREADS: 0
+        PARQUET_COMPRESSION_CODEC: SNAPPY
+        PARQUET_FILE_SIZE: 0
+        SUPPORT_START_OVER: false
+-->
+
+    <p outputclass="toc"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_set.xml#set"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_query_timeout_s.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_timeout_s.xml 
b/docs/topics/impala_query_timeout_s.xml
new file mode 100644
index 0000000..0486e01
--- /dev/null
+++ b/docs/topics/impala_query_timeout_s.xml
@@ -0,0 +1,56 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="query_timeout_s">
+
+  <title>QUERY_TIMEOUT_S Query Option (<keyword keyref="impala20"/> or higher 
only)</title>
+  <titlealts audience="PDF"><navtitle>QUERY_TIMEOUT_S</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.0.0">
+      <indexterm audience="Cloudera">QUERY_TIMEOUT_S query option</indexterm>
+      Sets the idle query timeout value for the session, in seconds. Queries 
that sit idle for longer than the
+      timeout value are automatically cancelled. If the system administrator 
specified the
+      <codeph>--idle_query_timeout</codeph> startup option, 
<codeph>QUERY_TIMEOUT_S</codeph> must be smaller than
+      or equal to the <codeph>--idle_query_timeout</codeph> value.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/timeout_clock_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET QUERY_TIMEOUT_S=<varname>seconds</varname>;</codeblock>
+
+<!-- Don't have a compelling example to show at this time because the 'idle' 
aspect only applies
+     when the client is careless and leaves the query open. Can't easily 
demonstrate in impala-shell.
+
+     <p conref="../shared/impala_common.xml#common/example_blurb"/>
+-->
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 0 (no timeout if <codeph>--idle_query_timeout</codeph> 
not in effect; otherwise, use
+      <codeph>--idle_query_timeout</codeph> value)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_20"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_timeouts.xml#timeouts"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_rcfile.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_rcfile.xml b/docs/topics/impala_rcfile.xml
new file mode 100644
index 0000000..1bfab8c
--- /dev/null
+++ b/docs/topics/impala_rcfile.xml
@@ -0,0 +1,244 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="rcfile">
+
+  <title>Using the RCFile File Format with Impala Tables</title>
+  <titlealts audience="PDF"><navtitle>RCFile Data Files</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <!-- <data name="Category" value="RCFile"/> -->
+      <data name="Category" value="File Formats"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">RCFile support in Impala</indexterm>
+      Impala supports using RCFile data files.
+    </p>
+
+    <table>
+      <title>RCFile Format Support in Impala</title>
+      <tgroup cols="5">
+        <colspec colname="1" colwidth="10*"/>
+        <colspec colname="2" colwidth="10*"/>
+        <colspec colname="3" colwidth="20*"/>
+        <colspec colname="4" colwidth="30*"/>
+        <colspec colname="5" colwidth="30*"/>
+        <thead>
+          <row>
+            <entry>
+              File Type
+            </entry>
+            <entry>
+              Format
+            </entry>
+            <entry>
+              Compression Codecs
+            </entry>
+            <entry>
+              Impala Can CREATE?
+            </entry>
+            <entry>
+              Impala Can INSERT?
+            </entry>
+          </row>
+        </thead>
+        <tbody>
+          <row conref="impala_file_formats.xml#file_formats/rcfile_support">
+            <entry/>
+          </row>
+        </tbody>
+      </tgroup>
+    </table>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="rcfile_create">
+
+    <title>Creating RCFile Tables and Loading Data</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="ETL"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        If you do not have an existing data file to use, begin by creating one 
in the appropriate format.
+      </p>
+
+      <p>
+        <b>To create an RCFile table:</b>
+      </p>
+
+      <p>
+        In the <codeph>impala-shell</codeph> interpreter, issue a command 
similar to:
+      </p>
+
+<codeblock>create table rcfile_table (<varname>column_specs</varname>) stored 
as rcfile;</codeblock>
+
+      <p>
+        Because Impala can query some kinds of tables that it cannot currently 
write to, after creating tables of
+        certain file formats, you might use the Hive shell to load the data. 
See
+        <xref href="impala_file_formats.xml#file_formats"/> for details. After 
loading data into a table through
+        Hive or other mechanism outside of Impala, issue a <codeph>REFRESH 
<varname>table_name</varname></codeph>
+        statement the next time you connect to the Impala node, before 
querying the table, to make Impala recognize
+        the new data.
+      </p>
+
+      <note type="important">
+        See <xref href="impala_known_issues.xml#known_issues"/> for potential 
compatibility issues with
+        RCFile tables created in Hive 0.12, due to a change in the default 
RCFile SerDe for Hive.
+      </note>
+
+      <p>
+        For example, here is how you might create some RCFile tables in Impala 
(by specifying the columns
+        explicitly, or cloning the structure of another table), load data 
through Hive, and query them through
+        Impala:
+      </p>
+
+<codeblock>$ impala-shell -i localhost
+[localhost:21000] &gt; create table rcfile_table (x int) stored as rcfile;
+[localhost:21000] &gt; create table rcfile_clone like some_other_table stored 
as rcfile;
+[localhost:21000] &gt; quit;
+
+$ hive
+hive&gt; insert into table rcfile_table select x from some_other_table;
+3 Rows loaded to rcfile_table
+Time taken: 19.015 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] &gt; select * from rcfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] &gt; -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] &gt; refresh rcfile_table;
+[localhost:21000] &gt; select * from rcfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</codeblock>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_unsupported_filetype"/>
+
+    </conbody>
+  </concept>
+
+  <concept id="rcfile_compression">
+
+    <title>Enabling Compression for RCFile Tables</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Snappy"/>
+      <data name="Category" value="Compression"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">compression</indexterm>
+        You may want to enable compression on existing tables. Enabling 
compression provides performance gains in
+        most cases and is supported for RCFile tables. For example, to enable 
Snappy compression, you would specify
+        the following additional settings when loading data through the Hive 
shell:
+      </p>
+
+<codeblock>hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET 
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; INSERT OVERWRITE TABLE <varname>new_table</varname> SELECT * FROM 
<varname>old_table</varname>;</codeblock>
+
+      <p>
+        If you are converting partitioned tables, you must complete additional 
steps. In such a case, specify
+        additional settings similar to the following:
+      </p>
+
+<codeblock>hive&gt; CREATE TABLE <varname>new_table</varname> 
(<varname>your_cols</varname>) PARTITIONED BY 
(<varname>partition_cols</varname>) STORED AS <varname>new_format</varname>;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE <varname>new_table</varname> 
PARTITION(<varname>comma_separated_partition_cols</varname>) SELECT * FROM 
<varname>old_table</varname>;</codeblock>
+
+      <p>
+        Remember that Hive does not require that you specify a source format 
for it. Consider the case of
+        converting a table with two partition columns called 
<codeph>year</codeph> and <codeph>month</codeph> to a
+        Snappy compressed RCFile. Combining the components outlined previously 
to complete this table conversion,
+        you would specify settings similar to the following:
+      </p>
+
+<codeblock>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) 
STORED AS RCFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET 
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</codeblock>
+
+      <p>
+        To complete a similar process for a table that includes partitions, 
you would specify settings similar to
+        the following:
+      </p>
+
+<codeblock>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) 
PARTITIONED BY (year INT) STORED AS RCFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET 
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM 
tbl;</codeblock>
+
+      <note>
+        <p>
+          The compression type is specified in the following command:
+        </p>
+<codeblock>SET 
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</codeblock>
+        <p>
+          You could elect to specify alternative codecs such as 
<codeph>GzipCodec</codeph> here.
+        </p>
+      </note>
+    </conbody>
+  </concept>
+
+  <concept id="rcfile_performance">
+
+    <title>Query Performance for Impala RCFile Tables</title>
+
+    <conbody>
+
+      <p>
+        In general, expect query performance with RCFile tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <xref href="impala_parquet.xml#parquet"/>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/s3_block_splitting"/>
+
+    </conbody>
+  </concept>
+
+  <concept audience="Cloudera" id="rcfile_data_types">
+
+    <title>Data Type Considerations for RCFile Tables</title>
+
+    <conbody>
+
+      <p></p>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_real.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_real.xml b/docs/topics/impala_real.xml
new file mode 100644
index 0000000..12ef5aa
--- /dev/null
+++ b/docs/topics/impala_real.xml
@@ -0,0 +1,46 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="real">
+
+  <title>REAL Data Type</title>
+  <titlealts audience="PDF"><navtitle>REAL</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      An alias for the <codeph>DOUBLE</codeph> data type. See <xref 
href="impala_double.xml#double"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      These examples show how you can use the type names <codeph>REAL</codeph> 
and <codeph>DOUBLE</codeph>
+      interchangeably, and behind the scenes Impala treats them always as 
<codeph>DOUBLE</codeph>.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table r1 (x real);
+[localhost:21000] &gt; describe r1;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | double |         |
++------+--------+---------+
+[localhost:21000] &gt; insert into r1 values (1.5), (cast (2.2 as double));
+[localhost:21000] &gt; select cast (1e6 as real);
++---------------------------+
+| cast(1000000.0 as double) |
++---------------------------+
+| 1000000                   |
++---------------------------+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_refresh.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_refresh.xml b/docs/topics/impala_refresh.xml
new file mode 100644
index 0000000..4b038ff
--- /dev/null
+++ b/docs/topics/impala_refresh.xml
@@ -0,0 +1,324 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="refresh">
+
+  <title>REFRESH Statement</title>
+  <titlealts audience="PDF"><navtitle>REFRESH</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Metastore"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REFRESH statement</indexterm>
+      To accurately respond to queries, the Impala node that acts as the 
coordinator (the node to which you are
+      connected through <cmdname>impala-shell</cmdname>, JDBC, or ODBC) must 
have current metadata about those
+      databases and tables that are referenced in Impala queries. If you are 
not familiar with the way Impala uses
+      metadata and how it shares the same metastore database as Hive, see
+      <xref href="impala_hadoop.xml#intro_metastore"/> for background 
information.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock rev="IMPALA-1683 CDH-43732">REFRESH 
[<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION 
(<varname>key_col1</varname>=<varname>val1</varname> [, 
<varname>key_col2</varname>=<varname>val2</varname>...])]</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Use the <codeph>REFRESH</codeph> statement to load the latest metastore 
metadata and block location data for
+      a particular table in these scenarios:
+    </p>
+
+    <ul>
+      <li>
+        After loading new data files into the HDFS data directory for the 
table. (Once you have set up an ETL
+        pipeline to bring data into Impala on a regular basis, this is 
typically the most frequent reason why
+        metadata needs to be refreshed.)
+      </li>
+
+      <li>
+        After issuing <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, 
<codeph>LOAD DATA</codeph>, or other
+        table-modifying SQL statement in Hive.
+      </li>
+    </ul>
+
+    <note rev="2.3.0">
+      <p rev="2.3.0">
+        In <keyword keyref="impala23_full"/> and higher, the syntax 
<codeph>ALTER TABLE <varname>table_name</varname> RECOVER PARTITIONS</codeph>
+        is a faster alternative to <codeph>REFRESH</codeph> when the only 
change to the table data is the addition of
+        new partition directories through Hive or manual HDFS operations.
+        See <xref href="impala_alter_table.xml#alter_table"/> for details.
+      </p>
+    </note>
+
+    <p>
+      You only need to issue the <codeph>REFRESH</codeph> statement on the 
node to which you connect to issue
+      queries. The coordinator node divides the work among all the Impala 
nodes in a cluster, and sends read
+      requests for the correct HDFS blocks without relying on the metadata on 
the other nodes.
+    </p>
+
+    <p>
+      <codeph>REFRESH</codeph> reloads the metadata for the table from the 
metastore database, and does an
+      incremental reload of the low-level block location data to account for 
any new data files added to the HDFS
+      data directory for the table. It is a low-overhead, single-table 
operation, specifically tuned for the common
+      scenario where new data files are added to HDFS.
+    </p>
+
+    <p>
+      Only the metadata for the specified table is flushed. The table must 
already exist and be known to Impala,
+      either because the <codeph>CREATE TABLE</codeph> statement was run in 
Impala rather than Hive, or because a
+      previous <codeph>INVALIDATE METADATA</codeph> statement caused Impala to 
reload its entire metadata catalog.
+    </p>
+
+    <note>
+      <p rev="1.2">
+        The catalog service broadcasts any changed metadata as a result of 
Impala
+        <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph> and <codeph>LOAD 
DATA</codeph> statements to all
+        Impala nodes. Thus, the <codeph>REFRESH</codeph> statement is only 
required if you load data through Hive
+        or by manipulating data files in HDFS directly. See <xref 
href="impala_components.xml#intro_catalogd"/> for
+        more information on the catalog service.
+      </p>
+      <p rev="1.2.1">
+        Another way to avoid inconsistency across nodes is to enable the
+        <codeph>SYNC_DDL</codeph> query option before performing a DDL 
statement or an <codeph>INSERT</codeph> or
+        <codeph>LOAD DATA</codeph>.
+      </p>
+      <p rev="1.1">
+        The table name is a required parameter. To flush the metadata for all 
tables, use the
+        <codeph><xref 
href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE 
METADATA</xref></codeph>
+        command.
+      </p>
+      <p conref="../shared/impala_common.xml#common/invalidate_then_refresh"/>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
+
+    <p>
+      A metadata update for an <codeph>impalad</codeph> instance <b>is</b> 
required if:
+    </p>
+
+    <ul>
+      <li>
+        A metadata change occurs.
+      </li>
+
+      <li>
+        <b>and</b> the change is made through Hive.
+      </li>
+
+      <li>
+        <b>and</b> the change is made to a metastore database to which clients 
such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p rev="1.2">
+      A metadata update for an Impala node is <b>not</b> required after you 
run <codeph>ALTER TABLE</codeph>,
+      <codeph>INSERT</codeph>, or other table-modifying statement in Impala 
rather than Hive. Impala handles the
+      metadata synchronization automatically through the catalog service.
+    </p>
+
+    <p>
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul>
+      <li>
+        Hive - through <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, 
<codeph>DROP</codeph> or
+        <codeph>INSERT</codeph> operations.
+      </li>
+
+      <li>
+        Impalad - through <codeph>CREATE TABLE</codeph>, <codeph>ALTER 
TABLE</codeph>, and <codeph>INSERT</codeph>
+        operations. <ph rev="1.2">Such changes are propagated to all Impala 
nodes by the
+        Impala catalog service.</ph>
+      </li>
+    </ul>
+
+    <p>
+      <codeph>REFRESH</codeph> causes the metadata for that table to be 
immediately reloaded. For a huge table,
+      that process could take a noticeable amount of time; but doing the 
refresh up front avoids an unpredictable
+      delay later, for example if the next reference to the table is during a 
benchmark test.
+    </p>
+
+    <p rev="IMPALA-1683 CDH-43732">
+      <b>Refreshing a single partition:</b>
+    </p>
+
+    <p rev="IMPALA-1683 CDH-43732">
+      In <keyword keyref="impala27_full"/> and higher, the 
<codeph>REFRESH</codeph> statement can apply to a single partition at a time,
+      rather than the whole table. Include the optional <codeph>PARTITION 
(<varname>partition_spec</varname>)</codeph>
+      clause and specify values for each of the partition key columns.
+    </p>
+
+    <p rev="IMPALA-1683 CDH-43732">
+      The following examples show how to make Impala aware of data added to a 
single partition, after data is loaded into
+      a partition's data directory using some mechanism outside Impala, such 
as Hive or Spark. The partition can be one that
+      Impala created and is already aware of, or a new partition created 
through Hive.
+    </p>
+
+<codeblock rev="IMPALA-1683 CDH-43732"><![CDATA[
+impala> create table p (x int) partitioned by (y int);
+impala> insert into p (x,y) values (1,2), (2,2), (2,1);
+impala> show partitions p;
++-------+-------+--------+------+...
+| y     | #Rows | #Files | Size |...
++-------+-------+--------+------+...
+| 1     | -1    | 1      | 2B   |...
+| 2     | -1    | 1      | 4B   |...
+| Total | -1    | 2      | 6B   |...
++-------+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism 
...
+beeline> insert into p partition (y = 1) values(1000);
+
+impala> refresh p partition (y=1);
+impala> select x from p where y=1;
++------+
+| x    |
++------+
+| 2    | <- Original data created by Impala
+| 1000 | <- Additional data inserted through Beeline
++------+
+]]>
+</codeblock>
+
+    <p rev="IMPALA-1683 CDH-43732">
+      The same applies for tables with more than one partition key column.
+      The <codeph>PARTITION</codeph> clause of the <codeph>REFRESH</codeph>
+      statement must include all the partition key columns.
+    </p>
+
+<codeblock rev="IMPALA-1683 CDH-43732"><![CDATA[
+impala> create table p2 (x int) partitioned by (y int, z int);
+impala> insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3);
+impala> show partitions p2;
++-------+---+-------+--------+------+...
+| y     | z | #Rows | #Files | Size |...
++-------+---+-------+--------+------+...
+| 0     | 0 | -1    | 1      | 2B   |...
+| 2     | 3 | -1    | 1      | 4B   |...
+| Total |   | -1    | 2      | 6B   |...
++-------+---+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism 
...
+beeline> insert into p2 partition (y = 2, z = 3) values(1000);
+
+impala> refresh p2 partition (y=2, z=3);
+impala> select x from p where y=2 and z = 3;
++------+
+| x    |
++------+
+| 1    | <- Original data created by Impala
+| 2    | <- Original data created by Impala
+| 1000 | <- Additional data inserted through Beeline
++------+
+]]>
+</codeblock>
+
+    <p rev="IMPALA-1683 CDH-43732">
+      The following examples show how specifying a nonexistent partition does 
not cause any error,
+      and the order of the partition key columns does not have to match the 
column order in the table.
+      The partition spec must include all the partition key columns; 
specifying an incomplete set of
+      columns does cause an error.
+    </p>
+
+<codeblock rev="IMPALA-1683 CDH-43732"><![CDATA[
+-- Partition doesn't exist.
+refresh p2 partition (y=0, z=3);
+refresh p2 partition (y=0, z=-1)
+-- Key columns specified in a different order than the table definition.
+refresh p2 partition (z=1, y=0)
+-- Incomplete partition spec causes an error.
+refresh p2 partition (y=0)
+ERROR: AnalysisException: Items in partition spec must exactly match the 
partition columns in the table definition: default.p2 (1 vs 2)
+]]>
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how you might use the 
<codeph>REFRESH</codeph> statement after manually adding
+      new HDFS data files to the Impala data directory for a table:
+    </p>
+
+<codeblock>[impalad-host:21000] &gt; refresh t1;
+[impalad-host:21000] &gt; refresh t2;
+[impalad-host:21000] &gt; select * from t1;
+...
+[impalad-host:21000] &gt; select * from t2;
+... </codeblock>
+
+    <p>
+      For more examples of using <codeph>REFRESH</codeph> and 
<codeph>INVALIDATE METADATA</codeph> with a
+      combination of Impala and Hive operations, see <xref 
href="impala_tutorial.xml#tutorial_impala_hive"/>.
+    </p>
+
+    <p>
+      <b>Related impala-shell options:</b>
+    </p>
+
+    <p rev="1.1">
+      The <cmdname>impala-shell</cmdname> option <codeph>-r</codeph> issues an 
<codeph>INVALIDATE METADATA</codeph> statement
+      when starting up the shell, effectively performing a 
<codeph>REFRESH</codeph> of all tables.
+      Due to the expense of reloading the metadata for all tables, the 
<cmdname>impala-shell</cmdname> <codeph>-r</codeph>
+      option is not recommended for day-to-day use in a production 
environment. (This option was mainly intended as a workaround
+      for synchronization issues in very old Impala versions.)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <codeph>LOCATION</codeph> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this 
statement,
+      but subsequent statements such as <codeph>SELECT</codeph>
+      or <codeph>SHOW TABLE STATS</codeph> could fail.
+    </p>
+    <p rev="IMPALA-1683 CDH-43732">
+      All HDFS and Sentry permissions and privileges are the same whether you 
refresh the entire table
+      or a single partition.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      The <codeph>REFRESH</codeph> command checks HDFS permissions of the 
underlying data files and directories,
+      caching this information so that a statement can be cancelled 
immediately if for example the
+      <codeph>impala</codeph> user does not have permission to write to the 
data directory for the table. Impala
+      reports any lack of write permissions as an <codeph>INFO</codeph> 
message in the log file, in case that
+      represents an oversight. If you change HDFS permissions to make data 
readable or writeable by the Impala
+      user, issue another <codeph>REFRESH</codeph> to make Impala aware of the 
change.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p conref="../shared/impala_common.xml#common/s3_metadata"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hadoop.xml#intro_metastore"/>,
+      <xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_release_notes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_release_notes.xml 
b/docs/topics/impala_release_notes.xml
new file mode 100644
index 0000000..65a3997
--- /dev/null
+++ b/docs/topics/impala_release_notes.xml
@@ -0,0 +1,17 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="impala_release_notes">
+
+  <title>Impala Release Notes</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody conref="impala_relnotes.xml#relnotes/relnotes_intro"/>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_relnotes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_relnotes.xml b/docs/topics/impala_relnotes.xml
new file mode 100644
index 0000000..5c53a21
--- /dev/null
+++ b/docs/topics/impala_relnotes.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="relnotes" audience="standalone">
+
+  <title>Impala Release Notes</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody id="relnotes_intro">
+
+    <p>
+      These release notes provide information on the <xref 
href="impala_new_features.xml#new_features">new
+      features</xref> and <xref 
href="impala_known_issues.xml#known_issues">known issues and limitations</xref> 
for
+      Impala versions up to <ph 
conref="../shared/ImpalaVariables.xml#impala_vars/ReleaseVersion"/>. For users
+      upgrading from earlier Impala releases, or using Impala in combination 
with specific versions of other
+      Cloudera software, <xref 
href="impala_incompatible_changes.xml#incompatible_changes"/> lists any changes 
to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p>
+      Once you are finished reviewing these release notes, for more 
information about using Impala, see
+      <xref audience="integrated" href="impala.xml"/><xref 
audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/impala.html";
 scope="external" format="html"/>.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_replica_preference.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_replica_preference.xml 
b/docs/topics/impala_replica_preference.xml
new file mode 100644
index 0000000..6cf73da
--- /dev/null
+++ b/docs/topics/impala_replica_preference.xml
@@ -0,0 +1,48 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="replica_preference" rev="2.7.0">
+
+  <title>REPLICA_PREFERENCE Query Option (<keyword keyref="impala27"/> or 
higher only)</title>
+  <titlealts audience="PDF"><navtitle>REPLICA_PREFERENCE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.7.0">
+      <indexterm audience="Cloudera">REPLICA_PREFERENCE query 
option</indexterm>
+    </p>
+
+    <p>
+      The <codeph>REPLICA_PREFERENCE</codeph> query option
+      lets you spread the load more evenly if hotspots and bottlenecks 
persist, by allowing hosts to do local reads,
+      or even remote reads, to retrieve the data for cached blocks if Impala 
can determine that it would be
+      too expensive to do all such processing on a particular host.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric (0, 3, 5)
+      or corresponding mnemonic strings (<codeph>CACHE_LOCAL</codeph>, 
<codeph>DISK_LOCAL</codeph>, <codeph>REMOTE</codeph>).
+      The gaps in the numeric sequence are to accomodate other intermediate
+      values that might be added in the future.
+    </p>
+
+    <p>
+      <b>Default:</b> 0 (equivalent to <codeph>CACHE_LOCAL</codeph>)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_270"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>, <xref 
href="impala_schedule_random_replica.xml#schedule_random_replica"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_request_pool.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_request_pool.xml 
b/docs/topics/impala_request_pool.xml
new file mode 100644
index 0000000..a820edd
--- /dev/null
+++ b/docs/topics/impala_request_pool.xml
@@ -0,0 +1,43 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.3.0" id="request_pool">
+
+  <title>REQUEST_POOL Query Option</title>
+  <titlealts audience="PDF"><navtitle>REQUEST_POOL</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Resource Management"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Admission Control"/>
+      <data name="Category" value="YARN"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REQUEST_POOL query option</indexterm>
+      The pool or queue name that queries should be submitted to. Only applies 
when you enable the Impala admission control feature.
+      Specifies the name of the pool used by requests from Impala to the 
resource manager.
+    </p>
+
+    <p>
+      <b>Type:</b> <codeph>STRING</codeph>
+    </p>
+
+    <p>
+      <b>Default:</b> empty (use the user-to-pool mapping defined by an 
<cmdname>impalad</cmdname> startup option
+      in the Impala configuration file)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_admission.xml"/>
+    </p>
+
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_reservation_request_timeout.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_reservation_request_timeout.xml 
b/docs/topics/impala_reservation_request_timeout.xml
new file mode 100644
index 0000000..0e01f83
--- /dev/null
+++ b/docs/topics/impala_reservation_request_timeout.xml
@@ -0,0 +1,38 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="reservation_request_timeout">
+
+  <title>RESERVATION_REQUEST_TIMEOUT Query Option (CDH 5 only)</title>
+  <titlealts 
audience="PDF"><navtitle>RESERVATION_REQUEST_TIMEOUT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Resource Management"/>
+      <data name="Category" value="YARN"/>
+      <data name="Category" value="Llama"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <note 
conref="../shared/impala_common.xml#common/llama_query_options_obsolete"/>
+
+    <p>
+      <indexterm audience="Cloudera">RESERVATION_REQUEST_TIMEOUT query 
option</indexterm>
+      Maximum number of milliseconds Impala will wait for a reservation to be 
completely granted or denied. Used in
+      conjunction with the Impala resource management feature in Impala 1.2 
and higher with CDH 5.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 300000 (5 minutes)
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_reserved_words.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_reserved_words.xml 
b/docs/topics/impala_reserved_words.xml
new file mode 100644
index 0000000..79dfb5c
--- /dev/null
+++ b/docs/topics/impala_reserved_words.xml
@@ -0,0 +1,365 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="reserved_words">
+
+  <title>Impala Reserved Words</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Planning"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">reserved words</indexterm>
+      The following are the reserved words for the current release of Impala. 
A reserved word is one that
+      cannot be used directly as an identifier; you must quote it with 
backticks. For example, a statement
+      <codeph>CREATE TABLE select (x INT)</codeph> fails, while <codeph>CREATE 
TABLE `select` (x INT)</codeph>
+      succeeds. Impala does not reserve the names of aggregate or scalar 
built-in functions. (Formerly, Impala did
+      reserve the names of some aggregate functions.)
+    </p>
+
+    <p>
+      Because different database systems have different sets of reserved 
words, and the reserved words change from
+      release to release, carefully consider database, table, and column names 
to ensure maximum compatibility
+      between products and versions.
+    </p>
+
+    <p>
+      Because you might switch between Impala and Hive when doing analytics 
and ETL, also consider whether
+      your object names are the same as any Hive keywords, and rename or quote 
any that conflict. Consult the
+      <xref 
href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords";
 scope="external" format="html">list of Hive keywords</xref>.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+<concept id="reserved_words_current">
+<title>List of Current Reserved Words</title>
+<conbody>
+<!-- This list is derived from the source code at:
+     
https://github.com/cloudera/Impala/blob/master/fe/src/main/jflex/sql-scanner.flex
+
+See the history, any recent changes, here:
+     
https://github.com/cloudera/Impala/commits/master/fe/src/main/jflex/sql-scanner.flex
+-->
+
+<codeblock rev="ver">add
+aggregate
+all
+alter
+<ph rev="2.0.0">analytic</ph>
+and
+<ph rev="2.0.0">anti</ph>
+<ph rev="1.4.0">api_version</ph>
+as
+asc
+avro
+between
+bigint
+<ph rev="1.4.0">binary</ph>
+boolean
+<ph rev="2.6.0">buckets</ph>
+by
+<ph rev="1.4.0">cached</ph>
+<ph rev="2.3.0">cascade</ph>
+case
+cast
+change
+<ph rev="2.0.0">char</ph>
+<ph rev="1.4.0">class</ph>
+<ph rev="1.2.1">close_fn</ph>
+column
+columns
+comment
+compute
+create
+cross
+<ph rev="2.0.0">current</ph>
+data
+database
+databases
+date
+datetime
+decimal
+<ph rev="2.6.0">delete</ph>
+delimited
+desc
+describe
+distinct
+<ph rev="2.6.0">distribute</ph>
+div
+double
+drop
+else
+end
+escaped
+exists
+explain
+<ph rev="2.5.0">extended</ph>
+external
+false
+fields
+fileformat
+<ph rev="1.2.1">finalize_fn</ph>
+first
+float
+<ph rev="2.0.0">following</ph>
+<ph rev="2.1.0">for</ph>
+format
+formatted
+from
+full
+function
+functions
+<ph rev="2.1.0">grant</ph>
+group
+<ph rev="2.6.0">hash</ph>
+having
+if
+<ph rev="2.6.0">ignore</ph>
+<ph rev="2.5.0">ilike</ph>
+in
+<ph rev="2.1.0">incremental</ph>
+<ph rev="1.2.1">init_fn</ph>
+inner
+inpath
+insert
+int
+integer
+intermediate
+interval
+into
+invalidate
+<ph rev="2.5.0">iregexp</ph>
+is
+join
+last
+left
+like
+limit
+lines
+load
+location
+<ph rev="1.2.1">merge_fn</ph>
+metadata
+not
+null
+nulls
+offset
+on
+or
+order
+outer
+<ph rev="2.0.0">over</ph>
+overwrite
+parquet
+parquetfile
+partition
+partitioned
+<ph rev="1.4.0">partitions</ph>
+<ph rev="2.0.0">preceding</ph>
+<ph rev="1.2.1">prepare_fn</ph>
+<ph rev="1.4.0">produced</ph>
+<ph rev="2.3.0">purge</ph>
+<ph rev="2.0.0">range</ph>
+rcfile
+real
+refresh
+regexp
+rename
+replace
+<ph rev="2.3.0">restrict</ph>
+returns
+<ph rev="2.1.0">revoke</ph>
+right
+rlike
+<ph rev="2.1.0">role</ph>
+<ph rev="2.1.0">roles</ph>
+row
+<ph rev="2.0.0">rows</ph>
+schema
+schemas
+select
+semi
+sequencefile
+serdeproperties
+<ph rev="2.0.0">serialize_fn</ph>
+set
+show
+smallint
+<ph rev="2.6.0">split</ph>
+stats
+stored
+straight_join
+string
+symbol
+table
+tables
+tblproperties
+terminated
+textfile
+then
+timestamp
+tinyint
+to
+true
+<ph rev="2.0.0">truncate</ph>
+<ph rev="2.0.0">unbounded</ph>
+<ph rev="1.4.0">uncached</ph>
+union
+<ph rev="2.6.0">update</ph>
+<ph rev="1.2.1">update_fn</ph>
+use
+using
+values
+<ph rev="2.0.0">varchar</ph>
+view
+when
+where
+with</codeblock>
+</conbody>
+</concept>
+
+<concept id="reserved_words_planning">
+<title>Planning for Future Reserved Words</title>
+<conbody>
+<p>
+The previous list of reserved words includes all the keywords
+used in the current level of Impala SQL syntax.
+To future-proof your code,
+you should avoid additional words in case they
+become reserved words if
+Impala adds features in later releases.
+This kind of planning can also help to avoid
+name conflicts in case you port SQL from other systems that
+have different sets of reserved words.
+</p>
+
+<p>
+The following list contains additional words that Cloudera
+recommends avoiding for table, column, or other object names,
+even though they are not currently reserved by Impala.
+</p>
+
+<codeblock>any
+authorization
+backup
+begin
+break
+browse
+bulk
+cascade
+check
+checkpoint
+close
+clustered
+coalesce
+collate
+commit
+constraint
+contains
+continue
+convert
+current
+current_date
+current_time
+current_timestamp
+current_user
+cursor
+dbcc
+deallocate
+declare
+default
+deny
+disk
+distributed
+dump
+errlvl
+escape
+except
+exec
+execute
+exit
+fetch
+file
+fillfactor
+for
+foreign
+freetext
+goto
+holdlock
+identity
+index
+intersect
+key
+kill
+lineno
+merge
+national
+nocheck
+nonclustered
+nullif
+of
+off
+offsets
+open
+option
+percent
+pivot
+plan
+precision
+primary
+print
+proc
+procedure
+public
+raiserror
+read
+readtext
+reconfigure
+references
+replication
+restore
+restrict
+return
+revert
+rollback
+rowcount
+rule
+save
+securityaudit
+session_user
+setuser
+shutdown
+some
+statistics
+system_user
+tablesample
+textsize
+then
+top
+tran
+transaction
+trigger
+try_convert
+unique
+unpivot
+updatetext
+user
+varying
+waitfor
+while
+within
+writetext
+</codeblock>
+</conbody>
+</concept>
+
+</concept>

[11/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to