[21/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:43 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_ldap.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_ldap.xml b/docs/topics/impala_ldap.xml
new file mode 100644
index 0000000..f2ef523
--- /dev/null
+++ b/docs/topics/impala_ldap.xml
@@ -0,0 +1,354 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="ldap">
+
+  <title>Enabling LDAP Authentication for Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Security"/>
+      <data name="Category" value="LDAP"/>
+      <data name="Category" value="Authentication"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Starting and Stopping"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+<!-- Similar discussion under 'Authentication' parent topic. Maybe do some 
conref'ing or linking upward. -->
+
+    <p> Authentication is the process of allowing only specified named users to
+      access the server (in this case, the Impala server). This feature is
+      crucial for any production deployment, to prevent misuse, tampering, or
+      excessive load on the server. Impala uses LDAP for authentication,
+      verifying the credentials of each user who connects through
+        <cmdname>impala-shell</cmdname>, Hue, a Business Intelligence tool, 
JDBC
+      or ODBC application, and so on. </p>
+
+    <note 
conref="../shared/impala_common.xml#common/authentication_vs_authorization"/>
+
+    <p outputclass="toc inpage"/>
+
+    <p>
+      An alternative form of authentication you can use is Kerberos, described 
in
+      <xref href="impala_kerberos.xml#kerberos"/>.
+    </p>
+  </conbody>
+
+  <concept id="ldap_prereqs">
+
+    <title>Requirements for Using Impala with LDAP</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Requirements"/>
+      <data name="Category" value="Planning"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p rev="1.4.0">
+        Authentication against LDAP servers is available in Impala 1.2.2 and 
higher. Impala 1.4.0 adds support for
+        secure LDAP authentication through SSL and TLS.
+      </p>
+
+      <p>
+        The Impala LDAP support lets you use Impala with systems such as 
Active Directory that use LDAP behind the
+        scenes.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_client_server">
+
+    <title>Client-Server Considerations for LDAP</title>
+
+    <conbody>
+
+      <p>
+        Only client-&gt;Impala connections can be authenticated by LDAP.
+      </p>
+
+      <p> You must use the Kerberos authentication mechanism for connections
+        between internal Impala components, such as between the
+          <cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and
+          <cmdname>catalogd</cmdname> daemons. See <xref
+          href="impala_kerberos.xml#kerberos" /> on how to set up Kerberos for
+        Impala. </p>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_config">
+
+    <title>Server-Side LDAP Setup</title>
+
+    <conbody>
+
+      <p>
+        These requirements apply on the server side when configuring and 
starting Impala:
+      </p>
+
+      <p>
+        To enable LDAP authentication, set the following startup options for 
<cmdname>impalad</cmdname>:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>--enable_ldap_auth</codeph> enables LDAP-based 
authentication between the client and Impala.
+        </li>
+
+        <li rev="1.4.0">
+          <codeph>--ldap_uri</codeph> sets the URI of the LDAP server to use. 
Typically, the URI is prefixed with
+          <codeph>ldap://</codeph>. In Impala 1.4.0 and higher, you can 
specify secure SSL-based LDAP transport by
+          using the prefix <codeph>ldaps://</codeph>. The URI can optionally 
specify the port, for example:
+          <codeph>ldap://ldap_server.cloudera.com:389</codeph> or
+          <codeph>ldaps://ldap_server.cloudera.com:636</codeph>. (389 and 636 
are the default ports for non-SSL and
+          SSL LDAP connections, respectively.)
+        </li>
+
+<!-- Some amount of this bullet could be conref'ed. Similar but not identical 
bullet occurs later under TLS. -->
+
+        <li rev="1.4.0">
+          For <codeph>ldaps://</codeph> connections secured by SSL,
+          
<codeph>--ldap_ca_certificate="<varname>/path/to/certificate/pem</varname>"</codeph>
 specifies the
+          location of the certificate in standard <codeph>.PEM</codeph> 
format. Store this certificate on the local
+          filesystem, in a location that only the <codeph>impala</codeph> user 
and other trusted users can read.
+        </li>
+
+<!-- Per Henry: not for public consumption.
+<li>
+  If you need to provide a custom SASL configuration,
+  set <codeph>- -ldap_manual_config</codeph> to bypass all the automatic 
configuration.
+</li>
+-->
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_bind_strings">
+
+    <title>Support for Custom Bind Strings</title>
+
+    <conbody>
+
+      <p>
+        When Impala connects to LDAP it issues a bind call to the LDAP server 
to authenticate as the connected
+        user. Impala clients, including the Impala shell, provide the short 
name of the user to Impala. This is
+        necessary so that Impala can use Sentry for role-based access, which 
uses short names.
+      </p>
+
+      <p>
+        However, LDAP servers often require more complex, structured usernames 
for authentication. Impala supports
+        three ways of transforming the short name (for example, 
<codeph>'henry'</codeph>) to a more complicated
+        string. If necessary, specify one of the following configuration 
options when starting the
+        <cmdname>impalad</cmdname> daemon on each DataNode:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>--ldap_domain</codeph>: Replaces the username with a string
+          
<codeph><varname>username</varname>@<varname>ldap_domain</varname></codeph>.
+        </li>
+
+        <li>
+          <codeph>--ldap_baseDN</codeph>: Replaces the username with a 
<q>distinguished name</q> (DN) of the form:
+          <codeph>uid=<varname>userid</varname>,ldap_baseDN</codeph>. (This is 
equivalent to a Hive option).
+        </li>
+
+        <li>
+          <codeph>--ldap_bind_pattern</codeph>: This is the most general 
option, and replaces the username with the
+          string <varname>ldap_bind_pattern</varname> where all instances of 
the string <codeph>#UID</codeph> are
+          replaced with <varname>userid</varname>. For example, an 
<codeph>ldap_bind_pattern</codeph> of
+          <codeph>"user=#UID,OU=foo,CN=bar"</codeph> with a username of 
<codeph>henry</codeph> will construct a
+          bind name of <codeph>"user=henry,OU=foo,CN=bar"</codeph>.
+        </li>
+      </ul>
+
+      <p rev="CDH-26854">
+        For clusters not managed by Cloudera Manager,
+        specify the option on the <cmdname>impalad</cmdname> command line.
+        For clusters managed by Cloudera Manager 5.4.0 and higher,
+        search for the configuration field names <codeph>ldap_domain</codeph>,
+        <codeph>ldap_basedn</codeph>, or <codeph>ldap_bind_pattern</codeph>,
+        fill in and save the appropriate field values, and restart the Impala 
service.
+        Prior to Cloudera Manager 5.4.0, these values were filled in using the
+        <uicontrol>Impala Daemon Command Line Argument Advanced Configuration 
Snippet (Safety Valve)</uicontrol>
+        field.
+      </p>
+
+      <p>
+        These options are mutually exclusive; Impala does not start if more 
than one of these options is specified.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_security">
+
+    <title>Secure LDAP Connections</title>
+
+    <conbody>
+
+      <p>
+        To avoid sending credentials over the wire in cleartext, you must 
configure a secure connection between
+        both the client and Impala, and between Impala and the LDAP server. 
The secure connection could use SSL or
+        TLS.
+      </p>
+
+      <p>
+        <b>Secure LDAP connections through SSL:</b>
+      </p>
+
+      <p>
+        For SSL-enabled LDAP connections, specify a prefix of 
<codeph>ldaps://</codeph> instead of
+        <codeph>ldap://</codeph>. Also, the default port for SSL-enabled LDAP 
connections is 636 instead of 389.
+      </p>
+
+      <p rev="1.4.0">
+        <b>Secure LDAP connections through TLS:</b>
+      </p>
+
+      <p>
+        <xref href="http://en.wikipedia.org/wiki/Transport_Layer_Security"; 
scope="external" format="html">TLS</xref>,
+        the successor to the SSL protocol, is supported by most modern LDAP 
servers. Unlike SSL connections, TLS
+        connections can be made on the same server port as non-TLS 
connections. To secure all connections using
+        TLS, specify the following flags as startup options to the 
<cmdname>impalad</cmdname> daemon:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>--ldap_tls</codeph> tells Impala to start a TLS connection 
to the LDAP server, and to fail
+          authentication if it cannot be done.
+        </li>
+
+        <li rev="1.4.0">
+          
<codeph>--ldap_ca_certificate="<varname>/path/to/certificate/pem</varname>"</codeph>
 specifies the
+          location of the certificate in standard <codeph>.PEM</codeph> 
format. Store this certificate on the local
+          filesystem, in a location that only the <codeph>impala</codeph> user 
and other trusted users can read.
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_impala_shell">
+
+    <title>LDAP Authentication for impala-shell Interpreter</title>
+
+    <conbody>
+
+      <p>
+        To connect to Impala using LDAP authentication, you specify 
command-line options to the
+        <cmdname>impala-shell</cmdname> command interpreter and enter the 
password when prompted:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>-l</codeph> enables LDAP authentication.
+        </li>
+
+        <li>
+          <codeph>-u</codeph> sets the user. Per Active Directory, the user is 
the short username, not the full
+          LDAP distinguished name. If your LDAP settings include a search 
base, use the
+          <codeph>--ldap_bind_pattern</codeph> on the 
<cmdname>impalad</cmdname> daemon to translate the short user
+          name from <cmdname>impala-shell</cmdname> automatically to the fully 
qualified name.
+<!--
+include that as part of the
+username, for example <codeph>[email protected]</codeph>.
+-->
+        </li>
+
+        <li>
+          <cmdname>impala-shell</cmdname> automatically prompts for the 
password.
+        </li>
+      </ul>
+
+      <p>
+        For the full list of available <cmdname>impala-shell</cmdname> 
options, see
+        <xref href="impala_shell_options.xml#shell_options"/>.
+      </p>
+
+      <p>
+        <b>LDAP authentication for JDBC applications:</b> See <xref 
href="impala_jdbc.xml#impala_jdbc"/> for the
+        format to use with the JDBC connection string for servers using LDAP 
authentication.
+      </p>
+    </conbody>
+  </concept>
+  <concept id="ldap_impala_hue">
+    <title>Enabling LDAP for Impala in Hue</title>
+    <prolog>
+      <metadata>
+        <data name="Category" value="Hue"/>
+      </metadata>
+    </prolog>
+    <conbody>
+      <section id="ldap_impala_hue_cm">
+        <title>Enabling LDAP for Impala in Hue Using Cloudera Manager</title>
+        <p>
+          <ol>
+            <li>Go to the Hue service.</li>
+            <li>Click the Configuration tab.</li>
+            <li>Select <menucascade><uicontrol>Scope</uicontrol><uicontrol>Hue
+                  Server</uicontrol></menucascade>.</li>
+            <li>Select
+              
<menucascade><uicontrol>Category</uicontrol><uicontrol>Advanced</uicontrol></menucascade>.</li>
+            <li>Add the following properties to the <b>Hue Server Advanced
+                Configuration Snippet (Safety Valve) for
+                hue_safety_valve_server.ini</b>
+              property.<codeblock>[impala]
+auth_username=&lt;LDAP username of Hue user to be authenticated>
+auth_password=&lt;LDAP password of Hue user to be 
authenticated></codeblock></li>
+            <li>Click <b>Save Changes</b>.</li>
+          </ol>
+        </p>
+      </section>
+      <section id="ldap_impala_hue_cmdline">
+        <title>Enabling LDAP for Impala in Hue Using the Command Line</title>
+        <p>LDAP authentication for the Impala app in Hue can be enabled by
+          setting the following properties under the <codeph>[impala]</codeph>
+          section in <codeph>hue.ini</codeph>. <table 
id="ldap_impala_hue_configs">
+            <tgroup cols="2">
+              <colspec colname="1" colwidth="1*" />
+              <colspec colname="2" colwidth="2*" />
+              <tbody>
+                <row>
+                  <entry><codeph>auth_username</codeph></entry>
+                  <entry>LDAP username of Hue user to be authenticated.</entry>
+                </row>
+                <row>
+                  <entry><codeph>auth_password</codeph></entry>
+                  <entry>
+                    <p>LDAP password of Hue user to be authenticated.</p>
+                  </entry>
+                </row>
+              </tbody>
+            </tgroup>
+          </table>These login details are only used by Impala to authenticate 
to
+          LDAP. The Impala service trusts Hue to have already validated the 
user
+          being impersonated, rather than simply passing on the 
credentials.</p>
+      </section>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_delegation">
+    <title>Enabling Impala Delegation for LDAP Users</title>
+    <conbody>
+      <p>
+        See <xref href="impala_delegation.xml#delegation"/> for details about 
the delegation feature
+        that lets certain users submit queries using the credentials of other 
users.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="ldap_restrictions">
+
+    <title>LDAP Restrictions for Impala</title>
+
+    <conbody>
+
+      <p>
+        The LDAP support is preliminary. It currently has only been tested 
against Active Directory.
+      </p>
+    </conbody>
+  </concept>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_limit.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_limit.xml b/docs/topics/impala_limit.xml
new file mode 100644
index 0000000..ec12271
--- /dev/null
+++ b/docs/topics/impala_limit.xml
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="limit">
+
+  <title>LIMIT Clause</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The <codeph>LIMIT</codeph> clause in a <codeph>SELECT</codeph> query 
sets a maximum number of rows for the
+      result set. Pre-selecting the maximum size of the result set helps 
Impala to optimize memory usage while
+      processing a distributed query.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>LIMIT <varname>constant_integer_expression</varname></codeblock>
+
+    <p>
+      The argument to the <codeph>LIMIT</codeph> clause must evaluate to a 
constant value. It can be a numeric
+      literal, or another kind of numeric expression involving operators, 
casts, and function return values. You
+      cannot refer to a column or use a subquery.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      This clause is useful in contexts such as:
+    </p>
+
+    <ul>
+      <li>
+        To return exactly N items from a top-N query, such as the 10 
highest-rated items in a shopping category or
+        the 50 hostnames that refer the most traffic to a web site.
+      </li>
+
+      <li>
+        To demonstrate some sample values from a table or a particular query. 
(To display some arbitrary items, use
+        a query with no <codeph>ORDER BY</codeph> clause. An <codeph>ORDER 
BY</codeph> clause causes additional
+        memory and/or disk usage during the query.)
+      </li>
+
+      <li>
+        To keep queries from returning huge result sets by accident if a table 
is larger than expected, or a
+        <codeph>WHERE</codeph> clause matches more rows than expected.
+      </li>
+    </ul>
+
+    <p rev="1.2.1">
+      Originally, the value for the <codeph>LIMIT</codeph> clause had to be a 
numeric literal. In Impala 1.2.1 and
+      higher, it can be a numeric expression.
+    </p>
+
+    <p rev="obwl" conref="../shared/impala_common.xml#common/order_by_limit"/>
+
+    <p>
+      See <xref href="impala_order_by.xml#order_by"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/limit_and_offset"/>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/subquery_no_limit"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how the <codeph>LIMIT</codeph> clause caps 
the size of the result set, with the
+      limit being applied after any other clauses such as 
<codeph>WHERE</codeph>.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create database limits;
+[localhost:21000] &gt; use limits;
+[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers values (1), (3), (4), (5), (2);
+Inserted 5 rows in 1.34s
+[localhost:21000] &gt; select x from numbers limit 100;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
+| 5 |
+| 2 |
++---+
+Returned 5 row(s) in 0.26s
+[localhost:21000] &gt; select x from numbers limit 3;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
++---+
+Returned 3 row(s) in 0.27s
+[localhost:21000] &gt; select x from numbers where x &gt; 2 limit 2;
++---+
+| x |
++---+
+| 3 |
+| 4 |
++---+
+Returned 2 row(s) in 0.27s</codeblock>
+
+    <p>
+      For top-N and bottom-N queries, you use the <codeph>ORDER BY</codeph> 
and <codeph>LIMIT</codeph> clauses
+      together:
+    </p>
+
+<codeblock rev="obwl">[localhost:21000] &gt; select x as "Top 3" from numbers 
order by x desc limit 3;
++-------+
+| top 3 |
++-------+
+| 5     |
+| 4     |
+| 3     |
++-------+
+[localhost:21000] &gt; select x as "Bottom 3" from numbers order by x limit 3;
++----------+
+| bottom 3 |
++----------+
+| 1        |
+| 2        |
+| 3        |
++----------+
+</codeblock>
+
+    <p>
+      You can use constant values besides integer literals as the 
<codeph>LIMIT</codeph> argument:
+    </p>
+
+<codeblock>-- Other expressions that yield constant integer values work too.
+SELECT x FROM t1 LIMIT 1e6;                        -- Limit is one million.
+SELECT x FROM t1 LIMIT length('hello world');      -- Limit is 11.
+SELECT x FROM t1 LIMIT 2+2;                        -- Limit is 4.
+SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT); -- Limit is 9.
+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_lineage.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_lineage.xml b/docs/topics/impala_lineage.xml
new file mode 100644
index 0000000..677fc3d
--- /dev/null
+++ b/docs/topics/impala_lineage.xml
@@ -0,0 +1,113 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="lineage" rev="2.2.0">
+
+  <title>Viewing Lineage Information for Impala Data</title>
+  <titlealts audience="PDF"><navtitle>Viewing Lineage 
Info</navtitle></titlealts>
+  <prolog>
+
+    <metadata>
+
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Lineage"/>
+      <data name="Category" value="Governance"/>
+      <data name="Category" value="Data Management"/>
+      <data name="Category" value="Navigator"/>
+      <data name="Category" value="Administrators"/>
+
+    </metadata>
+
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.2.0">
+      <indexterm audience="Cloudera">lineage</indexterm>
+      <indexterm audience="Cloudera">column lineage</indexterm>
+      <term>Lineage</term> is a feature in the Cloudera Navigator data
+      management component that helps you track where data originated, and how
+      data propagates through the system through SQL statements such as
+        <codeph>SELECT</codeph>, <codeph>INSERT</codeph>, and <codeph>CREATE
+        TABLE AS SELECT</codeph>. Impala is covered by the Cloudera Navigator
+      lineage features in <keyword keyref="impala22_full"/> and higher. </p>
+
+    <p>
+      This type of tracking is important in high-security configurations, 
especially in highly regulated industries
+      such as healthcare, pharmaceuticals, financial services and 
intelligence. For such kinds of sensitive data, it is important to know all
+      the places in the system that contain that data or other data derived 
from it; to verify who has accessed
+      that data; and to be able to doublecheck that the data used to make a 
decision was processed correctly and
+      not tampered with.
+    </p>
+
+    <p>
+      You interact with this feature through <term>lineage diagrams</term> 
showing relationships between tables and
+      columns. For instructions about interpreting lineage diagrams, see
+      <xref audience="integrated" href="cn_iu_lineage.xml" /><xref 
audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cn_iu_lineage.html";
 scope="external" format="html"/>.
+    </p>
+
+    <section id="column_lineage">
+
+      <title>Column Lineage</title>
+
+      <p>
+        <term>Column lineage</term> tracks information in fine detail, at the 
level of
+        particular columns rather than entire tables.
+      </p>
+
+      <p>
+        For example, if you have a table with information derived from web 
logs, you might copy that data into
+        other tables as part of the ETL process. The ETL operations might 
involve transformations through
+        expressions and function calls, and rearranging the columns into more 
or fewer tables
+        (<term>normalizing</term> or <term>denormalizing</term> the data). 
Then for reporting, you might issue
+        queries against multiple tables and views. In this example, column 
lineage helps you determine that data
+        that entered the system as <codeph>RAW_LOGS.FIELD1</codeph> was then 
turned into
+        <codeph>WEBSITE_REPORTS.IP_ADDRESS</codeph> through an <codeph>INSERT 
... SELECT</codeph> statement. Or,
+        conversely, you could start with a reporting query against a view, and 
trace the origin of the data in a
+        field such as <codeph>TOP_10_VISITORS.USER_ID</codeph> back to the 
underlying table and even further back
+        to the point where the data was first loaded into Impala.
+      </p>
+
+      <p>
+        When you have tables where you need to track or control access to 
sensitive information at the column
+        level, see <xref href="impala_authorization.xml#authorization"/> for 
how to implement column-level
+        security. You set up authorization using the Sentry framework, create 
views that refer to specific sets of
+        columns, and then assign authorization privileges to those views 
rather than the underlying tables.
+      </p>
+
+    </section>
+
+    <section id="lineage_data">
+
+      <title>Lineage Data for Impala</title>
+
+      <p>
+        The lineage feature is enabled by default. When lineage logging is 
enabled, the serialized column lineage
+        graph is computed for each query and stored in a specialized log file 
in JSON format.
+      </p>
+
+      <p>
+        Impala records queries in the lineage log if they complete 
successfully, or fail due to authorization
+        errors. For write operations such as <codeph>INSERT</codeph> and 
<codeph>CREATE TABLE AS SELECT</codeph>,
+        the statement is recorded in the lineage log only if it successfully 
completes. Therefore, the lineage
+        feature tracks data that was accessed by successful queries, or that 
was attempted to be accessed by
+        unsuccessful queries that were blocked due to authorization failure. 
These kinds of queries represent data
+        that really was accessed, or where the attempted access could 
represent malicious activity.
+      </p>
+
+      <p>
+        Impala does not record in the lineage log queries that fail due to 
syntax errors or that fail or are
+        cancelled before they reach the stage of requesting rows from the 
result set.
+      </p>
+
+      <p>
+        To enable or disable this feature on a system not managed by Cloudera 
Manager, set or remove the
+        <codeph>-lineage_event_log_dir</codeph> configuration option for the 
<cmdname>impalad</cmdname> daemon. For
+        information about turning the lineage feature on and off through 
Cloudera Manager, see
+        <xref audience="integrated" 
href="datamgmt_impala_lineage_log.xml"/><xref audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/datamgmt_impala_lineage_log.html";
 scope="external" format="html"/>.
+      </p>
+
+    </section>
+
+  </conbody>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_literals.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_literals.xml b/docs/topics/impala_literals.xml
new file mode 100644
index 0000000..d84d84c
--- /dev/null
+++ b/docs/topics/impala_literals.xml
@@ -0,0 +1,384 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="literals">
+
+  <title>Literals</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">literals</indexterm>
+      Each of the Impala data types has corresponding notation for literal 
values of that type. You specify literal
+      values in SQL statements, such as in the <codeph>SELECT</codeph> list or 
<codeph>WHERE</codeph> clause of a
+      query, or as an argument to a function call. See <xref 
href="impala_datatypes.xml#datatypes"/> for a complete
+      list of types, ranges, and conversion rules.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="numeric_literals">
+
+    <title>Numeric Literals</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">numeric literals</indexterm>
+        To write literals for the integer types (<codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>,
+        <codeph>INT</codeph>, and <codeph>BIGINT</codeph>), use a sequence of 
digits with optional leading zeros.
+      </p>
+
+      <p rev="1.4.0">
+        To write literals for the floating-point types (<codeph 
rev="1.4.0">DECIMAL</codeph>,
+        <codeph>FLOAT</codeph>, and <codeph>DOUBLE</codeph>), use a sequence 
of digits with an optional decimal
+        point (<codeph>.</codeph> character). To preserve accuracy during 
arithmetic expressions, Impala interprets
+        floating-point literals as the <codeph>DECIMAL</codeph> type with the 
smallest appropriate precision and
+        scale, until required by the context to convert the result to 
<codeph>FLOAT</codeph> or
+        <codeph>DOUBLE</codeph>.
+      </p>
+
+      <p>
+        Integer values are promoted to floating-point when necessary, based on 
the context.
+      </p>
+
+      <p>
+        You can also use exponential notation by including an 
<codeph>e</codeph> character. For example,
+        <codeph>1e6</codeph> is 1 times 10 to the power of 6 (1 million). A 
number in exponential notation is
+        always interpreted as floating-point.
+      </p>
+
+      <p rev="tk">
+        When Impala encounters a numeric literal, it considers the type to be 
the <q>smallest</q> that can
+        accurately represent the value. The type is promoted to larger or more 
accurate types if necessary, based
+        on subsequent parts of an expression.
+      </p>
+      <p>
+        For example, you can see by the types Impala defines for the following 
table columns
+        how it interprets the corresponding numeric literals:
+      </p>
+<codeblock>[localhost:21000] > create table ten as select 10 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc ten;
++------+---------+---------+
+| name | type    | comment |
++------+---------+---------+
+| x    | tinyint |         |
++------+---------+---------+
+
+[localhost:21000] > create table four_k as select 4096 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc four_k;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| x    | smallint |         |
++------+----------+---------+
+
+[localhost:21000] > create table one_point_five as select 1.5 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc one_point_five;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(2,1) |         |
++------+--------------+---------+
+
+[localhost:21000] > create table one_point_three_three_three as select 1.333 
as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc one_point_three_three_three;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(4,3) |         |
++------+--------------+---------+
+</codeblock>
+    </conbody>
+  </concept>
+
+  <concept id="string_literals">
+
+    <title>String Literals</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">string literals</indexterm>
+        String literals are quoted using either single or double quotation 
marks. You can use either kind of quotes
+        for string literals, even both kinds for different literals within the 
same statement.
+      </p>
+
+      <p rev="2.0.0">
+        Quoted literals are considered to be of type <codeph>STRING</codeph>. 
To use quoted literals in contexts
+        requiring a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> value, 
<codeph>CAST()</codeph> the literal to
+        a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> of the appropriate 
length.
+      </p>
+
+      <p>
+        <b>Escaping special characters:</b>
+      </p>
+
+      <p>
+        To encode special characters within a string literal, precede them 
with the backslash (<codeph>\</codeph>)
+        escape character:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>\t</codeph> represents a tab.
+        </li>
+
+        <li>
+          <codeph>\n</codeph> represents a newline or linefeed. This might 
cause extra line breaks in
+          <cmdname>impala-shell</cmdname> output.
+        </li>
+
+        <li>
+          <codeph>\r</codeph> represents a carriage return. This might cause 
unusual formatting (making it appear
+          that some content is overwritten) in <cmdname>impala-shell</cmdname> 
output.
+        </li>
+
+        <li>
+          <codeph>\b</codeph> represents a backspace. This might cause unusual 
formatting (making it appear that
+          some content is overwritten) in <cmdname>impala-shell</cmdname> 
output.
+        </li>
+
+        <li>
+          <codeph>\0</codeph> represents an ASCII <codeph>nul</codeph> 
character (not the same as a SQL
+          <codeph>NULL</codeph>). This might not be visible in 
<cmdname>impala-shell</cmdname> output.
+        </li>
+
+        <li>
+          <codeph>\Z</codeph> represents a DOS end-of-file character. This 
might not be visible in
+          <cmdname>impala-shell</cmdname> output.
+        </li>
+
+        <li>
+          <codeph>\%</codeph> and <codeph>\_</codeph> can be used to escape 
wildcard characters within the string
+          passed to the <codeph>LIKE</codeph> operator.
+        </li>
+
+        <li>
+          <codeph>\</codeph> followed by 3 octal digits represents the ASCII 
code of a single character; for
+          example, <codeph>\101</codeph> is ASCII 65, the character 
<codeph>A</codeph>.
+        </li>
+
+        <li>
+          Use two consecutive backslashes (<codeph>\\</codeph>) to prevent the 
backslash from being interpreted as
+          an escape character.
+        </li>
+
+        <li>
+          Use the backslash to escape single or double quotation mark 
characters within a string literal, if the
+          literal is enclosed by the same type of quotation mark.
+        </li>
+
+        <li>
+          If the character following the <codeph>\</codeph> does not represent 
the start of a recognized escape
+          sequence, the character is passed through unchanged.
+        </li>
+      </ul>
+
+      <p>
+        <b>Quotes within quotes:</b>
+      </p>
+
+      <p>
+        To include a single quotation character within a string value, enclose 
the literal with either single or
+        double quotation marks, and optionally escape the single quote as a 
<codeph>\'</codeph> sequence. Earlier
+        releases required escaping a single quote inside double quotes. 
Continue using escape sequences in this
+        case if you also need to run your SQL code on older versions of Impala.
+      </p>
+
+      <p>
+        To include a double quotation character within a string value, enclose 
the literal with single quotation
+        marks, no escaping is necessary in this case. Or, enclose the literal 
with double quotation marks and
+        escape the double quote as a <codeph>\"</codeph> sequence.
+      </p>
+
+<codeblock>[localhost:21000] &gt; select "What\'s happening?" as 
single_within_double,
+                  &gt;        'I\'m not sure.' as single_within_single,
+                  &gt;        "Homer wrote \"The Iliad\"." as 
double_within_double,
+                  &gt;        'Homer also wrote "The Odyssey".' as 
double_within_single;
++----------------------+----------------------+--------------------------+---------------------------------+
+| single_within_double | single_within_single | double_within_double     | 
double_within_single            |
++----------------------+----------------------+--------------------------+---------------------------------+
+| What's happening?    | I'm not sure.        | Homer wrote "The Iliad". | 
Homer also wrote "The Odyssey". |
++----------------------+----------------------+--------------------------+---------------------------------+
+</codeblock>
+
+      <p>
+        <b>Field terminator character in CREATE TABLE:</b>
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/thorn"/>
+
+      <p>
+        <b>impala-shell considerations:</b>
+      </p>
+
+      <p>
+        When dealing with output that includes non-ASCII or non-printable 
characters such as linefeeds and
+        backspaces, use the <cmdname>impala-shell</cmdname> options to save to 
a file, turn off pretty printing, or
+        both rather than relying on how the output appears visually. See
+        <xref href="impala_shell_options.xml#shell_options"/> for a list of 
<cmdname>impala-shell</cmdname>
+        options.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="boolean_literals">
+
+    <title>Boolean Literals</title>
+
+    <conbody>
+
+      <p>
+        For <codeph>BOOLEAN</codeph> values, the literals are 
<codeph>TRUE</codeph> and <codeph>FALSE</codeph>,
+        with no quotation marks and case-insensitive.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>select true;
+select * from t1 where assertion = false;
+select case bool_col when true then 'yes' when false 'no' else 'null' end from 
t1;</codeblock>
+    </conbody>
+  </concept>
+
+  <concept id="timestamp_literals">
+
+    <title>Timestamp Literals</title>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/timestamp_conversions"/>
+
+      <p>
+        You can also use <codeph>INTERVAL</codeph> expressions to add or 
subtract from timestamp literal values,
+        such as <codeph>'1966-07-30' + INTERVAL 5 YEARS + INTERVAL 3 
DAYS</codeph>. See
+        <xref href="impala_timestamp.xml#timestamp"/> for details.
+      </p>
+
+      <p>
+        Depending on your data pipeline, you might receive date and time data 
as text, in notation that does not
+        exactly match the format for Impala <codeph>TIMESTAMP</codeph> 
literals.
+        See <xref href="impala_datetime_functions.xml#datetime_functions"/> 
for functions that can convert
+        between a variety of string literals (including different field order, 
separators, and timezone notation)
+        and equivalent <codeph>TIMESTAMP</codeph> or numeric values.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="null">
+
+    <title>NULL</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">NULL</indexterm>
+        The notion of <codeph>NULL</codeph> values is familiar from all kinds 
of database systems, but each SQL
+        dialect can have its own behavior and restrictions on 
<codeph>NULL</codeph> values. For Big Data
+        processing, the precise semantics of <codeph>NULL</codeph> values are 
significant: any misunderstanding
+        could lead to inaccurate results or misformatted data, that could be 
time-consuming to correct for large
+        data sets.
+      </p>
+
+      <ul>
+        <li>
+          <codeph>NULL</codeph> is a different value than an empty string. The 
empty string is represented by a
+          string literal with nothing inside, <codeph>""</codeph> or 
<codeph>''</codeph>.
+        </li>
+
+        <li>
+          In a delimited text file, the <codeph>NULL</codeph> value is 
represented by the special token
+          <codeph>\N</codeph>.
+        </li>
+
+        <li>
+          When Impala inserts data into a partitioned table, and the value of 
one of the partitioning columns is
+          <codeph>NULL</codeph> or the empty string, the data is placed in a 
special partition that holds only
+          these two kinds of values. When these values are returned in a 
query, the result is <codeph>NULL</codeph>
+          whether the value was originally <codeph>NULL</codeph> or an empty 
string. This behavior is compatible
+          with the way Hive treats <codeph>NULL</codeph> values in partitioned 
tables. Hive does not allow empty
+          strings as partition keys, and it returns a string value such as
+          <codeph>__HIVE_DEFAULT_PARTITION__</codeph> instead of 
<codeph>NULL</codeph> when such values are
+          returned from a query. For example:
+<codeblock>create table t1 (i int) partitioned by (x int, y string);
+-- Select an INT column from another table, with all rows going into a special 
HDFS subdirectory
+-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the 
partitioning keys
+-- are null, this special directory name occurs at different levels of the 
physical data directory
+-- for the table.
+insert into t1 partition(x=NULL, y=NULL) select c1 from some_other_table;
+insert into t1 partition(x, y=NULL) select c1, c2 from some_other_table;
+insert into t1 partition(x=NULL, y) select c1, c3  from 
some_other_table;</codeblock>
+        </li>
+
+        <li>
+          There is no <codeph>NOT NULL</codeph> clause when defining a column 
to prevent <codeph>NULL</codeph>
+          values in that column.
+        </li>
+
+        <li>
+          There is no <codeph>DEFAULT</codeph> clause to specify a 
non-<codeph>NULL</codeph> default value.
+        </li>
+
+        <li>
+          If an <codeph>INSERT</codeph> operation mentions some columns but 
not others, the unmentioned columns
+          contain <codeph>NULL</codeph> for all inserted rows.
+        </li>
+
+        <li rev="1.2.1">
+          <p conref="../shared/impala_common.xml#common/null_sorting_change"/>
+          <note>
+            <!-- To do: Probably a bunch of similar view-related restrictions 
like this that should be collected, reused, or cross-referenced under the Views 
topic. -->
+            Because the <codeph>NULLS FIRST</codeph> and <codeph>NULLS 
LAST</codeph> keywords are not currently
+            available in Hive queries, any views you create using those 
keywords will not be available through
+            Hive.
+          </note>
+        </li>
+
+        <li>
+          In all other contexts besides sorting with <codeph>ORDER 
BY</codeph>, comparing a <codeph>NULL</codeph>
+          to anything else returns <codeph>NULL</codeph>, making the 
comparison meaningless. For example,
+          <codeph>10 &gt; NULL</codeph> produces <codeph>NULL</codeph>, 
<codeph>10 &lt; NULL</codeph> also produces
+          <codeph>NULL</codeph>, <codeph>5 BETWEEN 1 AND NULL</codeph> 
produces <codeph>NULL</codeph>, and so on.
+        </li>
+      </ul>
+
+      <p>
+        Several built-in functions serve as shorthand for evaluating 
expressions and returning
+        <codeph>NULL</codeph>, 0, or some other substitution value depending 
on the expression result:
+        <codeph>ifnull()</codeph>, <codeph>isnull()</codeph>, 
<codeph>nvl()</codeph>, <codeph>nullif()</codeph>,
+        <codeph>nullifzero()</codeph>, and <codeph>zeroifnull()</codeph>. See
+        <xref href="impala_conditional_functions.xml#conditional_functions"/> 
for details.
+      </p>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_live_progress.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_live_progress.xml 
b/docs/topics/impala_live_progress.xml
new file mode 100644
index 0000000..f99e4b1
--- /dev/null
+++ b/docs/topics/impala_live_progress.xml
@@ -0,0 +1,86 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.3.0" id="live_progress">
+
+  <title>LIVE_PROGRESS Query Option (<keyword keyref="impala23"/> or higher 
only)</title>
+  <titlealts audience="PDF"><navtitle>LIVE_PROGRESS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.3.0">
+      <indexterm audience="Cloudera">LIVE_PROGRESS query option</indexterm>
+      For queries submitted through the <cmdname>impala-shell</cmdname> 
command,
+      displays an interactive progress bar showing roughly what percentage of
+      processing has been completed. When the query finishes, the progress bar 
is erased
+      from the <cmdname>impala-shell</cmdname> console output.
+    </p>
+
+    <p>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <p conref="../shared/impala_common.xml#common/command_line_blurb"/>
+    <p>
+      You can enable this query option within <cmdname>impala-shell</cmdname>
+      by starting the shell with the <codeph>--live_progress</codeph>
+      command-line option.
+      You can still turn this setting off and on again within the shell 
through the
+      <codeph>SET</codeph> command.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p conref="../shared/impala_common.xml#common/live_reporting_details"/>
+    <p>
+      For a more detailed way of tracking the progress of an interactive query 
through
+      all phases of processing, see <xref 
href="impala_live_summary.xml#live_summary"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+    <p>
+      Because the percentage complete figure is calculated using the number of
+      issued and completed <q>scan ranges</q>, which occur while reading the 
table
+      data, the progress bar might reach 100% before the query is entirely 
finished.
+      For example, the query might do work to perform aggregations after all 
the
+      table data has been read. If many of your queries fall into this 
category,
+      consider using the <codeph>LIVE_SUMMARY</codeph> option instead for
+      more granular progress reporting.
+    </p>
+    <p 
conref="../shared/impala_common.xml#common/impala_shell_progress_reports_compute_stats_caveat"/>
+    <p 
conref="../shared/impala_common.xml#common/impala_shell_progress_reports_shell_only_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_230"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+<codeblock><![CDATA[[localhost:21000] > set live_progress=true;
+LIVE_PROGRESS set to true
+[localhost:21000] > select count(*) from customer;
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
+[###################################                                   ] 50%
+[######################################################################] 100%
+
+]]>
+</codeblock>
+
+    <p 
conref="../shared/impala_common.xml#common/live_progress_live_summary_asciinema"/>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_live_summary.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_live_summary.xml 
b/docs/topics/impala_live_summary.xml
new file mode 100644
index 0000000..9c7329b
--- /dev/null
+++ b/docs/topics/impala_live_summary.xml
@@ -0,0 +1,211 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.3.0" id="live_summary">
+
+  <title>LIVE_SUMMARY Query Option (<keyword keyref="impala23"/> or higher 
only)</title>
+  <titlealts audience="PDF"><navtitle>LIVE_SUMMARY</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.3.0">
+      <indexterm audience="Cloudera">LIVE_SUMMARY query option</indexterm>
+      For queries submitted through the <cmdname>impala-shell</cmdname> 
command,
+      displays the same output as the <codeph>SUMMARY</codeph> command,
+      with the measurements updated in real time as the query progresses.
+      When the query finishes, the final <codeph>SUMMARY</codeph> output 
remains
+      visible in the <cmdname>impala-shell</cmdname> console output.
+    </p>
+
+    <p>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <p conref="../shared/impala_common.xml#common/command_line_blurb"/>
+    <p>
+      You can enable this query option within <cmdname>impala-shell</cmdname>
+      by starting the shell with the <codeph>--live_summary</codeph>
+      command-line option.
+      You can still turn this setting off and on again within the shell 
through the
+      <codeph>SET</codeph> command.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      The live summary output can be useful for evaluating long-running 
queries,
+      to evaluate which phase of execution takes up the most time, or if some 
hosts
+      take much longer than others for certain operations, dragging overall 
performance down.
+      By making the information available in real time, this feature lets you 
decide what
+      action to take even before you cancel a query that is taking much longer 
than normal.
+    </p>
+    <p>
+      For example, you might see the HDFS scan phase taking a long time, and 
therefore revisit
+      performance-related aspects of your schema design such as constructing a 
partitioned table,
+      switching to the Parquet file format, running the <codeph>COMPUTE 
STATS</codeph> statement
+      for the table, and so on.
+      Or you might see a wide variation between the average and maximum times 
for all hosts to
+      perform some phase of the query, and therefore investigate if one 
particular host
+      needed more memory or was experiencing a network problem.
+    </p>
+    <p conref="../shared/impala_common.xml#common/live_reporting_details"/>
+    <p>
+      For a simple and concise way of tracking the progress of an interactive 
query, see
+      <xref href="impala_live_progress.xml#live_progress"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+    <p 
conref="../shared/impala_common.xml#common/impala_shell_progress_reports_compute_stats_caveat"/>
+    <p 
conref="../shared/impala_common.xml#common/impala_shell_progress_reports_shell_only_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_230"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows a series of <codeph>LIVE_SUMMARY</codeph> 
reports that
+      are displayed during the course of a query, showing how the numbers 
increase to
+      show the progress of different phases of the distributed query. When you 
do the same
+      in <cmdname>impala-shell</cmdname>, only a single report is displayed at 
any one time,
+      with each update overwriting the previous numbers.
+    </p>
+
+<codeblock><![CDATA[[localhost:21000] > set live_summary=true;
+LIVE_SUMMARY set to true
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 0      | 0ns      | 0ns      | 0       | 22.50B     | 
0 B      | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 0      | 0ns      | 0ns      | 0       | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 0      | 0ns      | 0ns      | 0       | 150.00K    | 
0 B      | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 17.62s   | 17.62s   | 81.14M  | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.53ms | 247.53ms | 1.02K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 61.85s   | 61.85s   | 283.43M | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.59ms | 247.59ms | 2.05K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+]]>
+</codeblock>
+
+<!-- Keeping this sample output that illustrates a couple of glitches in the 
LIVE_SUMMARY display, hidden, to help filing JIRAs. -->
+<codeblock audience="Cloudera"><![CDATA[[
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 91.34s   | 91.34s   | 419.48M | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.63ms | 247.63ms | 3.07K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 140.49s  | 140.49s  | 646.82M | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.73ms | 247.73ms | 5.12K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 228.96s  | 228.96s  | 1.06B   | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.83ms | 247.83ms | 7.17K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 563.11s  | 563.11s  | 2.59B   | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 248.11ms | 248.11ms | 17.41K  | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 985.71s  | 985.71s  | 4.54B   | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 248.49ms | 248.49ms | 30.72K  | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | None     | None     | 5.42B   | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 248.66ms | 248.66ms | 36.86K  | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
+Query: select count(*) from customer t1 cross join customer t2
+[####################################################################################################]
 100%
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
+]]>
+</codeblock>
+
+    <p 
conref="../shared/impala_common.xml#common/live_progress_live_summary_asciinema"/>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_load_data.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_load_data.xml b/docs/topics/impala_load_data.xml
new file mode 100644
index 0000000..0652914
--- /dev/null
+++ b/docs/topics/impala_load_data.xml
@@ -0,0 +1,256 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="load_data">
+
+  <title>LOAD DATA Statement</title>
+  <titlealts audience="PDF"><navtitle>LOAD DATA</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="DML"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="HDFS"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="S3"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">LOAD DATA statement</indexterm>
+      The <codeph>LOAD DATA</codeph> statement streamlines the ETL process for 
an internal Impala table by moving a
+      data file or all the data files in a directory from an HDFS location 
into the Impala data directory for that
+      table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>LOAD DATA INPATH '<varname>hdfs_file_or_directory_path</varname>' 
[OVERWRITE] INTO TABLE <varname>tablename</varname>
+  [PARTITION (<varname>partcol1</varname>=<varname>val1</varname>, 
<varname>partcol2</varname>=<varname>val2</varname> ...)]</codeblock>
+
+    <p>
+      When the <codeph>LOAD DATA</codeph> statement operates on a partitioned 
table,
+      it always operates on one partition at a time. Specify the 
<codeph>PARTITION</codeph> clauses
+      and list all the partition key columns, with a constant value specified 
for each.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/dml_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <ul>
+      <li>
+        The loaded data files are moved, not copied, into the Impala data 
directory.
+      </li>
+
+      <li>
+        You can specify the HDFS path of a single file to be moved, or the 
HDFS path of a directory to move all the
+        files inside that directory. You cannot specify any sort of wildcard 
to take only some of the files from a
+        directory. When loading a directory full of data files, keep all the 
data files at the top level, with no
+        nested directories underneath.
+      </li>
+
+      <li>
+        Currently, the Impala <codeph>LOAD DATA</codeph> statement only 
imports files from HDFS, not from the local
+        filesystem. It does not support the <codeph>LOCAL</codeph> keyword of 
the Hive <codeph>LOAD DATA</codeph>
+        statement. You must specify a path, not an <codeph>hdfs://</codeph> 
URI.
+      </li>
+
+      <li>
+        In the interest of speed, only limited error checking is done. If the 
loaded files have the wrong file
+        format, different columns than the destination table, or other kind of 
mismatch, Impala does not raise any
+        error for the <codeph>LOAD DATA</codeph> statement. Querying the table 
afterward could produce a runtime
+        error or unexpected results. Currently, the only checking the 
<codeph>LOAD DATA</codeph> statement does is
+        to avoid mixing together uncompressed and LZO-compressed text files in 
the same table.
+      </li>
+
+      <li>
+        When you specify an HDFS directory name as the <codeph>LOAD 
DATA</codeph> argument, any hidden files in
+        that directory (files whose names start with a <codeph>.</codeph>) are 
not moved to the Impala data
+        directory.
+      </li>
+
+      <li rev="2.5.0 IMPALA-2867">
+        The operation fails if the source directory contains any non-hidden 
directories.
+        Prior to <keyword keyref="impala25_full"/> if the source directory 
contained any subdirectory, even a hidden one such as
+        <filepath>_impala_insert_staging</filepath>, the <codeph>LOAD 
DATA</codeph> statement would fail.
+        In <keyword keyref="impala25_full"/> and higher, <codeph>LOAD 
DATA</codeph> ignores hidden subdirectories in the
+        source directory, and only fails if any of the subdirectories are 
non-hidden.
+      </li>
+
+      <li>
+        The loaded data files retain their original names in the new location, 
unless a name conflicts with an
+        existing data file, in which case the name of the new file is modified 
slightly to be unique. (The
+        name-mangling is a slight difference from the Hive <codeph>LOAD 
DATA</codeph> statement, which replaces
+        identically named files.)
+      </li>
+
+      <li>
+        By providing an easy way to transport files from known locations in 
HDFS into the Impala data directory
+        structure, the <codeph>LOAD DATA</codeph> statement lets you avoid 
memorizing the locations and layout of
+        HDFS directory tree containing the Impala databases and tables. (For a 
quick way to check the location of
+        the data files for an Impala table, issue the statement 
<codeph>DESCRIBE FORMATTED
+        <varname>table_name</varname></codeph>.)
+      </li>
+
+      <li>
+        The <codeph>PARTITION</codeph> clause is especially convenient for 
ingesting new data for a partitioned
+        table. As you receive new data for a time period, geographic region, 
or other division that corresponds to
+        one or more partitioning columns, you can load that data straight into 
the appropriate Impala data
+        directory, which might be nested several levels down if the table is 
partitioned by multiple columns. When
+        the table is partitioned, you must specify constant values for all the 
partitioning columns.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      Because Impala currently cannot create Parquet data files containing 
complex types
+      (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and 
<codeph>MAP</codeph>), the
+      <codeph>LOAD DATA</codeph> statement is especially important when 
working with
+      tables containing complex type columns. You create the Parquet data 
files outside
+      Impala, then use either <codeph>LOAD DATA</codeph>, an external table, 
or HDFS-level
+      file operations followed by <codeph>REFRESH</codeph> to associate the 
data files with
+      the corresponding table.
+      See <xref href="impala_complex_types.xml#complex_types"/> for details 
about using complex types.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      First, we use a trivial Python script to write different numbers of 
strings (one per line) into files stored
+      in the <codeph>cloudera</codeph> HDFS user account. (Substitute the path 
for your own HDFS user account when
+      doing <cmdname>hdfs dfs</cmdname> operations like these.)
+    </p>
+
+<codeblock>$ random_strings.py 1000 | hdfs dfs -put - 
/user/cloudera/thousand_strings.txt
+$ random_strings.py 100 | hdfs dfs -put - /user/cloudera/hundred_strings.txt
+$ random_strings.py 10 | hdfs dfs -put - 
/user/cloudera/ten_strings.txt</codeblock>
+
+    <p>
+      Next, we create a table and load an initial set of data into it. 
Remember, unless you specify a
+      <codeph>STORED AS</codeph> clause, Impala tables default to 
<codeph>TEXTFILE</codeph> format with Ctrl-A (hex
+      01) as the field delimiter. This example uses a single-column table, so 
the delimiter is not significant. For
+      large-scale ETL jobs, you would typically use binary format data files 
such as Parquet or Avro, and load them
+      into Impala tables that use the corresponding file format.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table t1 (s string);
+[localhost:21000] &gt; load data inpath '/user/cloudera/thousand_strings.txt' 
into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.61s
+[kilo2-202-961.cs1cloud.internal:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1000 |
++------+
+Returned 1 row(s) in 0.67s
+[localhost:21000] &gt; load data inpath '/user/cloudera/thousand_strings.txt' 
into table t1;
+ERROR: AnalysisException: INPATH location 
'/user/cloudera/thousand_strings.txt' does not exist. </codeblock>
+
+    <p>
+      As indicated by the message at the end of the previous example, the data 
file was moved from its original
+      location. The following example illustrates how the data file was moved 
into the Impala data directory for
+      the destination table, keeping its original filename:
+    </p>
+
+<codeblock>$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1
+Found 1 items
+-rw-r--r--   1 cloudera cloudera      13926 2013-06-26 15:40 
/user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</codeblock>
+
+    <p>
+      The following example demonstrates the difference between the 
<codeph>INTO TABLE</codeph> and
+      <codeph>OVERWRITE TABLE</codeph> clauses. The table already contains 
1000 rows. After issuing the
+      <codeph>LOAD DATA</codeph> statement with the <codeph>INTO 
TABLE</codeph> clause, the table contains 100 more
+      rows, for a total of 1100. After issuing the <codeph>LOAD DATA</codeph> 
statement with the <codeph>OVERWRITE
+      INTO TABLE</codeph> clause, the former contents are gone, and now the 
table only contains the 10 rows from
+      the just-loaded data file.
+    </p>
+
+<codeblock>[localhost:21000] &gt; load data inpath 
'/user/cloudera/hundred_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.24s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1100 |
++------+
+Returned 1 row(s) in 0.55s
+[localhost:21000] &gt; load data inpath '/user/cloudera/ten_strings.txt' 
overwrite into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++-----+
+| _c0 |
++-----+
+| 10  |
++-----+
+Returned 1 row(s) in 0.62s</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p conref="../shared/impala_common.xml#common/s3_dml"/>
+    <p conref="../shared/impala_common.xml#common/s3_dml_performance"/>
+    <p>See <xref href="../topics/impala_s3.xml#s3"/> for details about reading 
and writing S3 data with Impala.</p>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have read and write
+      permissions for the files in the source directory, and write
+      permission for the destination directory.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_no_load_data"/>
+
+    <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+    <p conref="../shared/impala_common.xml#common/hbase_no_load_data"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      The <codeph>LOAD DATA</codeph> statement is an alternative to the
+      <codeph><xref href="impala_insert.xml#insert">INSERT</xref></codeph> 
statement.
+      Use <codeph>LOAD DATA</codeph>
+      when you have the data files in HDFS but outside of any Impala table.
+    </p>
+    <p>
+      The <codeph>LOAD DATA</codeph> statement is also an alternative
+      to the <codeph>CREATE EXTERNAL TABLE</codeph> statement. Use
+      <codeph>LOAD DATA</codeph> when it is appropriate to move the
+      data files under Impala control rather than querying them
+      from their original location. See <xref 
href="impala_tables.xml#external_tables"/>
+      for information about working with external tables.
+    </p>
+  </conbody>
+</concept>

[21/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to