[04/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:54 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_sync_ddl.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_sync_ddl.xml b/docs/topics/impala_sync_ddl.xml
new file mode 100644
index 0000000..7453e94
--- /dev/null
+++ b/docs/topics/impala_sync_ddl.xml
@@ -0,0 +1,59 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2.1" id="sync_ddl">
+
+  <title>SYNC_DDL Query Option</title>
+  <titlealts audience="PDF"><navtitle>SYNC_DDL</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SYNC_DDL query option</indexterm>
+      When enabled, causes any DDL operation such as <codeph>CREATE 
TABLE</codeph> or <codeph>ALTER TABLE</codeph>
+      to return only when the changes have been propagated to all other Impala 
nodes in the cluster by the Impala
+      catalog service. That way, if you issue a subsequent 
<codeph>CONNECT</codeph> statement in
+      <cmdname>impala-shell</cmdname> to connect to a different node in the 
cluster, you can be sure that other
+      node will already recognize any added or changed tables. (The catalog 
service automatically broadcasts the
+      DDL changes to all nodes automatically, but without this option there 
could be a period of inconsistency if
+      you quickly switched to another node, such as by issuing a subsequent 
query through a load-balancing proxy.)
+    </p>
+
+    <p>
+      Although <codeph>INSERT</codeph> is classified as a DML statement, when 
the <codeph>SYNC_DDL</codeph> option
+      is enabled, <codeph>INSERT</codeph> statements also delay their 
completion until all the underlying data and
+      metadata changes are propagated to all Impala nodes. Internally, Impala 
inserts have similarities with DDL
+      statements in traditional database systems, because they create metadata 
needed to track HDFS block locations
+      for new files and they potentially add new partitions to partitioned 
tables.
+    </p>
+
+    <note>
+      Because this option can introduce a delay after each write operation, if 
you are running a sequence of
+      <codeph>CREATE DATABASE</codeph>, <codeph>CREATE TABLE</codeph>, 
<codeph>ALTER TABLE</codeph>,
+      <codeph>INSERT</codeph>, and similar statements within a setup script, 
to minimize the overall delay you can
+      enable the <codeph>SYNC_DDL</codeph> query option only near the end, 
before the final DDL statement.
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <!-- To do:
+      Example could be useful here.
+    -->
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_ddl.xml#ddl"/>
+    </p>
+
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_tables.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tables.xml b/docs/topics/impala_tables.xml
new file mode 100644
index 0000000..80b4000
--- /dev/null
+++ b/docs/topics/impala_tables.xml
@@ -0,0 +1,258 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="tables">
+
+  <title>Overview of Impala Tables</title>
+  <titlealts audience="PDF"><navtitle>Tables</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p/>
+
+    <p>
+      Tables are the primary containers for data in Impala. They have the 
familiar row and column layout similar to
+      other database systems, plus some features such as partitioning often 
associated with higher-end data
+      warehouse systems.
+    </p>
+
+    <p>
+      Logically, each table has a structure based on the definition of its 
columns, partitions, and other
+      properties.
+    </p>
+
+    <p>
+      Physically, each table that uses HDFS storage is associated with a 
directory in HDFS. The table data consists of all the data files
+      underneath that directory:
+    </p>
+
+    <ul>
+      <li>
+        <xref href="impala_tables.xml#internal_tables">Internal tables</xref> 
are managed by Impala, and use directories
+        inside the designated Impala work area.
+      </li>
+
+      <li>
+        <xref href="impala_tables.xml#external_tables">External tables</xref> 
use arbitrary HDFS directories, where
+        the data files are typically shared between different Hadoop 
components.
+      </li>
+
+      <li>
+        Large-scale data is usually handled by partitioned tables, where the 
data files are divided among different
+        HDFS subdirectories.
+      </li>
+    </ul>
+
+    <p rev="2.2.0">
+      Impala tables can also represent data that is stored in HBase, or in the 
Amazon S3 filesystem (CDH 5.4.0 or higher),
+      or on Isilon storage devices (CDH 5.4.3 or higher).  See <xref 
href="impala_hbase.xml#impala_hbase"/>,
+      <xref href="impala_s3.xml#s3"/>, and <xref 
href="impala_isilon.xml#impala_isilon"/>
+      for details about those special kinds of tables.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
+
+    <p>
+      <b>Related statements:</b> <xref 
href="impala_create_table.xml#create_table"/>,
+      <xref href="impala_drop_table.xml#drop_table"/>, <xref 
href="impala_alter_table.xml#alter_table"/>
+      <xref href="impala_insert.xml#insert"/>, <xref 
href="impala_load_data.xml#load_data"/>,
+      <xref href="impala_select.xml#select"/>
+    </p>
+  </conbody>
+
+  <concept id="internal_tables">
+
+    <title>Internal Tables</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">internal tables</indexterm>
+        The default kind of table produced by the <codeph>CREATE 
TABLE</codeph> statement is known as an internal
+        table. (Its counterpart is the external table, produced by the 
<codeph>CREATE EXTERNAL TABLE</codeph>
+        syntax.)
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Impala creates a directory in HDFS to hold the data files.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can create data in internal tables by issuing 
<codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
+            statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If you add or replace data using HDFS operations, issue the 
<codeph>REFRESH</codeph> command in
+            <cmdname>impala-shell</cmdname> so that Impala recognizes the 
changes in data files, block locations,
+            and so on.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you issue a <codeph>DROP TABLE</codeph> statement, Impala 
physically removes all the data files
+            from the directory.
+          </p>
+        </li>
+
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/check_internal_external_table"/>
+        </li>
+
+        <li>
+          <p>
+            When you issue an <codeph>ALTER TABLE</codeph> statement to rename 
an internal table, all data files
+            are moved into the new HDFS directory for the table. The files are 
moved even if they were formerly in
+            a directory outside the Impala data directory, for example in an 
internal table with a
+            <codeph>LOCATION</codeph> attribute pointing to an outside HDFS 
directory.
+          </p>
+        </li>
+      </ul>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/switch_internal_external_table"/>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_tables.xml#external_tables"/>, <xref 
href="impala_create_table.xml#create_table"/>,
+        <xref href="impala_drop_table.xml#drop_table"/>, <xref 
href="impala_alter_table.xml#alter_table"/>,
+        <xref href="impala_describe.xml#describe"/>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="external_tables">
+
+    <title>External Tables</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">external tables</indexterm>
+        The syntax <codeph>CREATE EXTERNAL TABLE</codeph> sets up an Impala 
table that points at existing data
+        files, potentially in HDFS locations outside the normal Impala data 
directories.. This operation saves the
+        expense of importing the data into a new table when you already have 
the data files in a known location in
+        HDFS, in the desired file format.
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            You can use Impala to query the data in this table.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can create data in external tables by issuing 
<codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
+            statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If you add or replace data using HDFS operations, issue the 
<codeph>REFRESH</codeph> command in
+            <cmdname>impala-shell</cmdname> so that Impala recognizes the 
changes in data files, block locations,
+            and so on.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you issue a <codeph>DROP TABLE</codeph> statement in Impala, 
that removes the connection that
+            Impala has with the associated data files, but does not physically 
remove the underlying data. You can
+            continue to use the data files with other Hadoop components and 
HDFS operations.
+          </p>
+        </li>
+
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/check_internal_external_table"/>
+        </li>
+
+        <li>
+          <p>
+            When you issue an <codeph>ALTER TABLE</codeph> statement to rename 
an external table, all data files
+            are left in their original locations.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can point multiple external tables at the same HDFS directory 
by using the same
+            <codeph>LOCATION</codeph> attribute for each one. The tables could 
have different column definitions,
+            as long as the number and types of columns are compatible with the 
schema evolution considerations for
+            the underlying file type. For example, for text data files, one 
table might define a certain column as
+            a <codeph>STRING</codeph> while another defines the same column as 
a <codeph>BIGINT</codeph>.
+          </p>
+        </li>
+      </ul>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/switch_internal_external_table"/>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_tables.xml#internal_tables"/>, <xref 
href="impala_create_table.xml#create_table"/>,
+        <xref href="impala_drop_table.xml#drop_table"/>, <xref 
href="impala_alter_table.xml#alter_table"/>,
+        <xref href="impala_describe.xml#describe"/>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="table_file_formats">
+    <title>File Formats</title>
+    <conbody>
+      <p>
+        Each table has an associated file format, which determines how Impala 
interprets the
+        associated data files. See <xref 
href="impala_file_formats.xml#file_formats"/> for details.
+      </p>
+      <p>
+        You set the file format during the <codeph>CREATE TABLE</codeph> 
statement,
+        or change it later using the <codeph>ALTER TABLE</codeph> statement.
+        Partitioned tables can have a different file format for individual 
partitions,
+        allowing you to change the file format used in your ETL process for 
new data
+        without going back and reconverting all the existing data in the same 
table.
+      </p>
+      <p>
+        Any <codeph>INSERT</codeph> statements produce new data files with the 
current file format of the table.
+        For existing data files, changing the file format of the table does 
not automatically do any data conversion.
+        You must use <codeph>TRUNCATE TABLE</codeph> or <codeph>INSERT 
OVERWRITE</codeph> to remove any previous data
+        files that use the old file format.
+        Then you use the <codeph>LOAD DATA</codeph> statement, <codeph>INSERT 
... SELECT</codeph>, or other mechanism
+        to put data files of the correct format into the table.
+      </p>
+      <p>
+        The default file format, text, is the most flexible and easy to 
produce when you are just getting started with
+        Impala. The Parquet file format offers the highest query performance 
and uses compression to reduce storage
+        requirements; therefore, <ph rev="upstream">Cloudera</ph> recommends 
using Parquet for Impala tables with substantial amounts of data.
+        <ph rev="2.3.0">Also, the complex types (<codeph>ARRAY</codeph>, 
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>)
+        available in <keyword keyref="impala23_full"/> and higher are 
currently only supported with the Parquet file type.</ph>
+        Based on your existing ETL workflow, you might use other file formats 
such as Avro, possibly doing a final
+        conversion step to Parquet to take advantage of its performance for 
analytic queries.
+      </p>
+    </conbody>
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_timeouts.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_timeouts.xml b/docs/topics/impala_timeouts.xml
new file mode 100644
index 0000000..03ad773
--- /dev/null
+++ b/docs/topics/impala_timeouts.xml
@@ -0,0 +1,172 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2.1" id="timeouts">
+
+  <title>Setting Timeout Periods for Daemons, Queries, and Sessions</title>
+
+  <titlealts audience="PDF">
+
+    <navtitle>Setting Timeouts</navtitle>
+
+  </titlealts>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Scheduling"/>
+      <data name="Category" value="Scalability"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Depending on how busy your CDH cluster is, you might increase or 
decrease various timeout
+      values. Increase timeouts if Impala is cancelling operations 
prematurely, when the system
+      is responding slower than usual but the operations are still successful 
if given extra
+      time. Decrease timeouts if operations are idle or hanging for long 
periods, and the idle
+      or hung operations are consuming resources and reducing concurrency.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+  <concept id="statestore_timeout">
+
+    <title>Increasing the Statestore Timeout</title>
+
+    <conbody>
+
+      <p rev="IMP-1210">
+        If you have an extensive Impala schema, for example with hundreds of 
databases, tens of
+        thousands of tables, and so on, you might encounter timeout errors 
during startup as the
+        Impala catalog service broadcasts metadata to all the Impala nodes 
using the statestore
+        service. To avoid such timeout errors on startup, increase the 
statestore timeout value
+        from its default of 10 seconds. Specify the timeout value using the
+        <codeph>-statestore_subscriber_timeout_seconds</codeph> option for the 
statestore
+        service, using the configuration instructions in
+        <xref href="impala_config_options.xml#config_options"/>. The symptom 
of this problem is
+        messages in the <codeph>impalad</codeph> log such as:
+      </p>
+
+<codeblock>Connection with state-store lost
+Trying to re-register with state-store</codeblock>
+
+      <p>
+        See <xref href="impala_scalability.xml#statestore_scalability"/> for 
more details about
+        statestore operation and settings on clusters with a large number of 
Impala-related
+        objects such as tables and partitions.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="impalad_timeout">
+
+    <title>Setting the Idle Query and Idle Session Timeouts for impalad</title>
+
+    <conbody>
+
+      <p>
+        To keep long-running queries or idle sessions from tying up cluster 
resources, you can
+        set timeout intervals for both individual queries, and entire sessions.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/timeout_clock_blurb"/>
+
+      <p>
+        Specify the following startup options for the 
<cmdname>impalad</cmdname> daemon:
+      </p>
+
+      <ul>
+        <li>
+          The <codeph>--idle_query_timeout</codeph> option specifies the time 
in seconds after
+          which an idle query is cancelled. This could be a query whose 
results were all fetched
+          but was never closed, or one whose results were partially fetched 
and then the client
+          program stopped requesting further results. This condition is most 
likely to occur in
+          a client program using the JDBC or ODBC interfaces, rather than in 
the interactive
+          <cmdname>impala-shell</cmdname> interpreter. Once the query is 
cancelled, the client
+          program cannot retrieve any further results.
+        </li>
+
+        <li>
+          The <codeph>--idle_session_timeout</codeph> option specifies the 
time in seconds after
+          which an idle session is expired. A session is idle when no activity 
is occurring for
+          any of the queries in that session, and the session has not started 
any new queries.
+          Once a session is expired, you cannot issue any new query requests 
to it. The session
+          remains open, but the only operation you can perform is to close it. 
The default value
+          of 0 means that sessions never expire.
+        </li>
+      </ul>
+
+      <p>
+        For instructions on changing <cmdname>impalad</cmdname> startup 
options, see
+        <xref href="impala_config_options.xml#config_options"/>.
+      </p>
+
+      <p rev="2.0.0">
+        You can reduce the idle query timeout by using the 
<codeph>QUERY_TIMEOUT_S</codeph>
+        query option. Any value specified for the 
<codeph>--idle_query_timeout</codeph> startup
+        option serves as an upper limit for the 
<codeph>QUERY_TIMEOUT_S</codeph> query option.
+        See <xref href="impala_query_timeout_s.xml#query_timeout_s"/> for 
details.
+      </p>
+
+    </conbody>
+
+  </concept>
+  <concept id="concept_rfy_jl1_rx">
+    <title>Setting Timeout and Retries for Thrift Connections to the Backend
+      Client</title>
+    <conbody>
+      <p>Impala connections to the backend client are subject to failure in
+        cases when the network is momentarily overloaded. To avoid failed
+        queries due to transient network problems, you can configure the number
+        of Thrift connection retries using the following option: </p>
+      <ul id="ul_bj3_ql1_rx">
+        <li>The <codeph>--backend_client_connection_num_retries</codeph> option
+          specifies the number of times Impala will try connecting to the
+          backend client after the first connection attempt fails. By default,
+            <cmdname>impalad</cmdname> will attempt three re-connections before
+          it returns a failure. </li>
+      </ul>
+      <p>You can configure timeouts for sending and receiving data from the
+        backend client. Therefore, if for some reason a query hangs, instead of
+        waiting indefinitely for a response, Impala will terminate the
+        connection after a configurable timeout.</p>
+      <ul id="ul_vm2_2v1_rx">
+        <li>The <codeph>--backend_client_rpc_timeout_ms</codeph> option can be
+          used to specify the number of milliseconds Impala should wait for a
+          response from the backend client before it terminates the connection
+          and signals a failure. The default value for this property is 300000
+          milliseconds, or 5 minutes. </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="cancel_query">
+
+    <title>Cancelling a Query</title>
+
+    <conbody>
+
+      <p>
+        Sometimes, an Impala query might run for an unexpectedly long time, 
tying up resources
+        in the cluster. You can cancel the query explicitly, independent of 
the timeout period,
+        by going into the web UI for the <cmdname>impalad</cmdname> host (on 
port 25000 by
+        default), and using the link on the <codeph>/queries</codeph> tab to 
cancel the running
+        query. Various client applications let you interactively cancel 
queries submitted or
+        monitored through those applications. For example, by pressing 
<codeph>^C</codeph> in
+        <cmdname>impala-shell</cmdname>, clicking the 
<uicontrol>Cancel</uicontrol> button from
+        the <uicontrol>Watch</uicontrol> page in Hue, clicking 
<uicontrol>Actions &gt;
+        Cancel</uicontrol> from the <uicontrol>Queries</uicontrol> list in 
Cloudera Manager, and
+        so on.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_timestamp.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_timestamp.xml b/docs/topics/impala_timestamp.xml
new file mode 100644
index 0000000..2f4fa40
--- /dev/null
+++ b/docs/topics/impala_timestamp.xml
@@ -0,0 +1,445 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="timestamp">
+
+  <title>TIMESTAMP Data Type</title>
+  <titlealts audience="PDF"><navtitle>TIMESTAMP</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Dates and Times"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER 
TABLE</codeph> statements, representing a
+      point in time.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+    </p>
+
+<codeblock><varname>column_name</varname> TIMESTAMP</codeblock>
+
+    <p>
+      <b>Range:</b> Allowed date values range from 1400-01-01 to 9999-12-31; 
this range is different from the Hive
+      <codeph>TIMESTAMP</codeph> type. Internally, the resolution of the time 
portion of a
+      <codeph>TIMESTAMP</codeph> value is in nanoseconds.
+    </p>
+
+    <p>
+      <b>INTERVAL expressions:</b>
+    </p>
+
+    <p>
+      You can perform date arithmetic by adding or subtracting a specified 
number of time units, using the
+      <codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> and 
<codeph>-</codeph> operators or
+      <codeph>date_add()</codeph> and <codeph>date_sub()</codeph> functions. 
You can specify units as
+      <codeph>YEAR[S]</codeph>, <codeph>MONTH[S]</codeph>, 
<codeph>WEEK[S]</codeph>, <codeph>DAY[S]</codeph>,
+      <codeph>HOUR[S]</codeph>, <codeph>MINUTE[S]</codeph>, 
<codeph>SECOND[S]</codeph>,
+      <codeph>MILLISECOND[S]</codeph>, <codeph>MICROSECOND[S]</codeph>, and 
<codeph>NANOSECOND[S]</codeph>. You can
+      only specify one time unit in each interval expression, for example 
<codeph>INTERVAL 3 DAYS</codeph> or
+      <codeph>INTERVAL 25 HOURS</codeph>, but you can produce any granularity 
by adding together successive
+      <codeph>INTERVAL</codeph> values, such as 
<codeph><varname>timestamp_value</varname> + INTERVAL 3 WEEKS -
+      INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</codeph>.
+    </p>
+
+    <p>
+      For example:
+    </p>
+
+<codeblock>select now() + interval 1 day;
+select date_sub(now(), interval 5 minutes);
+insert into auction_details
+  select auction_id, auction_start_time, auction_start_time + interval 2 days 
+ interval 12 hours
+  from new_auctions;</codeblock>
+
+    <p>
+      <b>Time zones:</b>
+    </p>
+
+    <p>
+      By default, Impala does not store timestamps using the local timezone, 
to avoid undesired results from
+      unexpected time zone issues. Timestamps are stored and interpreted 
relative to UTC, both when written to or
+      read from data files, or when converted to or from Unix time values 
through functions such as
+      <codeph>from_unixtime()</codeph> or <codeph>unix_timestamp()</codeph>. 
To convert such a
+      <codeph>TIMESTAMP</codeph> value to one that represents the date and 
time in a specific time zone, convert
+      the original value with the <codeph>from_utc_timestamp()</codeph> 
function.
+    </p>
+
+    <p>
+      Because Impala does not assume that <codeph>TIMESTAMP</codeph> values 
are in any particular time zone, you
+      must be conscious of the time zone aspects of data that you query, 
insert, or convert.
+    </p>
+
+    <p>
+      For consistency with Unix system calls, the <codeph>TIMESTAMP</codeph> 
returned by the <codeph>now()</codeph>
+      function represents the local time in the system time zone, rather than 
in UTC. To store values relative to
+      the current time in a portable way, convert any <codeph>now()</codeph> 
return values using the
+      <codeph>to_utc_timestamp()</codeph> function first. For example, the 
following example shows that the current
+      time in California (where this Impala cluster is located) is shortly 
after 2 PM. If that value was written to a data
+      file, and shipped off to a distant server to be analyzed alongside other 
data from far-flung locations, the
+      dates and times would not match up precisely because of time zone 
differences. Therefore, the
+      <codeph>to_utc_timestamp()</codeph> function converts it using a common 
reference point, the UTC time zone
+      (descended from the old Greenwich Mean Time standard). The 
<codeph>'PDT'</codeph> argument indicates that the
+      original value is from the Pacific time zone with Daylight Saving Time 
in effect. When servers in all
+      geographic locations run the same transformation on any local date and 
time values (with the appropriate time
+      zone argument), the stored data uses a consistent representation. Impala 
queries can use functions such as
+      <codeph>EXTRACT()</codeph>, <codeph>MIN()</codeph>, 
<codeph>AVG()</codeph>, and so on to do time-series
+      analysis on those timestamps.
+    </p>
+
+<codeblock>[localhost:21000] > select now();
++-------------------------------+
+| now()                         |
++-------------------------------+
+| 2015-04-09 14:07:46.580465000 |
++-------------------------------+
+[localhost:21000] > select to_utc_timestamp(now(), 'PDT');
++--------------------------------+
+| to_utc_timestamp(now(), 'pdt') |
++--------------------------------+
+| 2015-04-09 21:08:07.664547000  |
++--------------------------------+
+</codeblock>
+
+    <p>
+      The converse function, <codeph>from_utc_timestamp()</codeph>, lets you 
take stored <codeph>TIMESTAMP</codeph>
+      data or calculated results and convert back to local date and time for 
processing on the application side.
+      The following example shows how you might represent some future date 
(such as the ending date and time of an
+      auction) in UTC, and then convert back to local time when convenient for 
reporting or other processing. The
+      final query in the example tests whether this arbitrary UTC date and 
time has passed yet, by converting it
+      back to the local time zone and comparing it against the current date 
and time.
+    </p>
+
+<codeblock>[localhost:21000] > select to_utc_timestamp(now() + interval 2 
weeks, 'PDT');
++---------------------------------------------------+
+| to_utc_timestamp(now() + interval 2 weeks, 'pdt') |
++---------------------------------------------------+
+| 2015-04-23 21:08:34.152923000                     |
++---------------------------------------------------+
+[localhost:21000] > select from_utc_timestamp('2015-04-23 
21:08:34.152923000','PDT');
++------------------------------------------------------------+
+| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') |
++------------------------------------------------------------+
+| 2015-04-23 14:08:34.152923000                              |
++------------------------------------------------------------+
+[localhost:21000] > select from_utc_timestamp('2015-04-23 
21:08:34.152923000','PDT') &lt; now();
++--------------------------------------------------------------------+
+| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') &lt; now() |
++--------------------------------------------------------------------+
+| false                                                              |
++--------------------------------------------------------------------+
+</codeblock>
+
+    <p rev="2.2.0">
+      If you have data files written by Hive, those <codeph>TIMESTAMP</codeph> 
values represent the local timezone
+      of the host where the data was written, potentially leading to 
inconsistent results when processed by Impala.
+      To avoid compatibility problems or having to code workarounds, you can 
specify one or both of these
+      <cmdname>impalad</cmdname> startup flags: 
<codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph>
+      <codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>. 
Although
+      <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is turned 
off by default to avoid performance overhead, <ph rev="upstream">Cloudera</ph> 
recommends
+      turning it on when processing <codeph>TIMESTAMP</codeph> columns in 
Parquet files written by Hive, to avoid unexpected behavior.
+    </p>
+
+    <p rev="2.2.0">
+      The <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> 
setting affects conversions from
+      <codeph>TIMESTAMP</codeph> to <codeph>BIGINT</codeph>, or from 
<codeph>BIGINT</codeph>
+      to <codeph>TIMESTAMP</codeph>. By default, Impala treats all 
<codeph>TIMESTAMP</codeph> values as UTC,
+      to simplify analysis of time-series data from different geographic 
regions. When you enable the
+      <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting, 
these operations
+      treat the input values as if they are in the local tie zone of the host 
doing the processing.
+      See <xref href="impala_datetime_functions.xml#datetime_functions"/> for 
the list of functions
+      affected by the 
<codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting.
+    </p>
+
+    <p>
+      The following sequence of examples shows how the interpretation of 
<codeph>TIMESTAMP</codeph> values in
+      Parquet tables is affected by the setting of the 
<codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph>
+      setting.
+    </p>
+
+    <p>
+      Regardless of the 
<codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting,
+      <codeph>TIMESTAMP</codeph> columns in text tables can be written and 
read interchangeably by Impala and Hive:
+    </p>
+
+<codeblock>Impala DDL and queries for text table:
+
+[localhost:21000] > create table t1 (x timestamp);
+[localhost:21000] > insert into t1 values (now()), (now() + interval 1 day);
+[localhost:21000] > select x from t1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+[localhost:21000] > select to_utc_timestamp(x, 'PDT') from t1;
++-------------------------------+
+| to_utc_timestamp(x, 'pdt')    |
++-------------------------------+
+| 2015-04-07 22:43:02.892403000 |
+| 2015-04-08 22:43:02.892403000 |
++-------------------------------+
+
+Hive query for text table:
+
+hive> select * from t1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 1.245 seconds, Fetched: 2 row(s)
+</codeblock>
+
+    <p>
+      When the table uses Parquet format, Impala expects any time zone 
adjustment to be applied prior to writing,
+      while <codeph>TIMESTAMP</codeph> values written by Hive are adjusted to 
be in the UTC time zone. When Hive
+      queries Parquet data files that it wrote, it adjusts the 
<codeph>TIMESTAMP</codeph> values back to the local
+      time zone, while Impala does no conversion. Hive does no time zone 
conversion when it queries Impala-written
+      Parquet files.
+    </p>
+
+<codeblock>Impala DDL and queries for Parquet table:
+
+[localhost:21000] > create table p1 stored as parquet as select x from t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+[localhost:21000] > select x from p1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+
+Hive DDL and queries for Parquet table:
+
+hive> create table h1 (x timestamp) stored as parquet;
+OK
+hive> insert into h1 select * from p1;
+...
+OK
+Time taken: 35.573 seconds
+hive> select x from p1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 0.324 seconds, Fetched: 2 row(s)
+hive> select x from h1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 0.197 seconds, Fetched: 2 row(s)
+</codeblock>
+
+    <p>
+      The discrepancy arises when Impala queries the Hive-created Parquet 
table. The underlying values in the
+      <codeph>TIMESTAMP</codeph> column are different from the ones written by 
Impala, even though they were copied
+      from one table to another by an <codeph>INSERT ... SELECT</codeph> 
statement in Hive. Hive did an implicit
+      conversion from the local time zone to UTC as it wrote the values to 
Parquet.
+    </p>
+
+<codeblock>Impala query for TIMESTAMP values from Impala-written and 
Hive-written data:
+
+[localhost:21000] > select * from p1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.29s
+[localhost:21000] > select * from h1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 22:43:02.892403000 |
+| 2015-04-08 22:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.41s
+
+Underlying integer values for Impala-written and Hive-written data:
+
+[localhost:21000] > select cast(x as bigint) from p1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428421382        |
+| 1428507782        |
++-------------------+
+Fetched 2 row(s) in 0.38s
+[localhost:21000] > select cast(x as bigint) from h1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582        |
+| 1428532982        |
++-------------------+
+Fetched 2 row(s) in 0.20s
+</codeblock>
+
+    <p>
+      When the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> 
setting is enabled, Impala recognizes
+      the Parquet data files written by Hive, and applies the same 
UTC-to-local-timezone conversion logic during
+      the query as Hive uses, making the contents of the Impala-written 
<codeph>P1</codeph> table and the
+      Hive-written <codeph>H1</codeph> table appear identical, whether 
represented as <codeph>TIMESTAMP</codeph>
+      values or the underlying <codeph>BIGINT</codeph> integers:
+    </p>
+
+<codeblock>[localhost:21000] > select x from p1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.37s
+[localhost:21000] > select x from h1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.19s
+[localhost:21000] > select cast(x as bigint) from p1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582        |
+| 1428532982        |
++-------------------+
+Fetched 2 row(s) in 0.29s
+[localhost:21000] > select cast(x as bigint) from h1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582        |
+| 1428532982        |
++-------------------+
+Fetched 2 row(s) in 0.22s
+</codeblock>
+
+    <p>
+      <b>Conversions:</b>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/timestamp_conversions"/>
+
+    <p>
+      In Impala 1.3 and higher, the <codeph>FROM_UNIXTIME()</codeph> and 
<codeph>UNIX_TIMESTAMP()</codeph>
+      functions allow a wider range of format strings, with more flexibility 
in element order, repetition of letter
+      placeholders, and separator characters. In <keyword 
keyref="impala23_full"/> and higher, the <codeph>UNIX_TIMESTAMP()</codeph>
+      function also allows a numeric timezone offset to be specified as part 
of the input string.
+      See <xref href="impala_datetime_functions.xml#datetime_functions"/> for 
details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/y2k38"/>
+
+    <p>
+      <b>Partitioning:</b>
+    </p>
+
+    <p>
+      Although you cannot use a <codeph>TIMESTAMP</codeph> column as a 
partition key, you can extract the
+      individual years, months, days, hours, and so on and partition based on 
those columns. Because the partition
+      key column values are represented in HDFS directory names, rather than 
as fields in the data files
+      themselves, you can also keep the original <codeph>TIMESTAMP</codeph> 
values if desired, without duplicating
+      data or wasting storage space. See <xref 
href="impala_partitioning.xml#partition_key_columns"/> for more
+      details on partitioning with date and time values.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table timeline (event string) 
partitioned by (happened timestamp);
+ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as 
partition-column type in column: happened
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>select cast('1966-07-30' as timestamp);
+select cast('1985-09-25 17:45:30.005' as timestamp);
+select cast('08:30:00' as timestamp);
+select hour('1970-01-01 15:30:00');         -- Succeeds, returns 15.
+select hour('1970-01-01 15:30');            -- Returns NULL because seconds 
field required.
+select hour('1970-01-01 27:30:00');         -- Returns NULL because hour value 
out of range.
+select dayofweek('2004-06-13');             -- Returns 1, representing Sunday.
+select dayname('2004-06-13');               -- Returns 'Sunday'.
+select date_add('2004-06-13', 365);         -- Returns 2005-06-13 with zeros 
for hh:mm:ss fields.
+select day('2004-06-13');                   -- Returns 13.
+select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 
dates?
+select now();                               -- Returns current date and time 
in local timezone.
+
+create table dates_and_times (t timestamp);
+insert into dates_and_times values
+  ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/null_bad_timestamp_cast"/>
+
+    <p conref="../shared/impala_common.xml#common/partitioning_worrisome"/>
+
+    <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/parquet_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!--    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> 
-->
+
+    <p conref="../shared/impala_common.xml#common/internals_16_bytes"/>
+
+    <p conref="../shared/impala_common.xml#common/added_forever"/>
+
+    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+    <p conref="../shared/impala_common.xml#common/sqoop_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/sqoop_timestamp_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p>
+      If you cast a <codeph>STRING</codeph> with an unrecognized format to a 
<codeph>TIMESTAMP</codeph>, the result
+      is <codeph>NULL</codeph> rather than an error. Make sure to test your 
data pipeline to be sure any textual
+      date and time values are in a format that Impala 
<codeph>TIMESTAMP</codeph> can recognize.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <ul>
+      <li>
+<!-- The Timestamp Literals topic is pretty brief. Consider adding more 
examples there. -->
+        <xref href="impala_literals.xml#timestamp_literals"/>.
+      </li>
+
+      <li>
+        To convert to or from different date formats, or perform date 
arithmetic, use the date and time functions
+        described in <xref 
href="impala_datetime_functions.xml#datetime_functions"/>. In particular, the
+        <codeph>from_unixtime()</codeph> function requires a case-sensitive 
format string such as
+        <codeph>"yyyy-MM-dd HH:mm:ss.SSSS"</codeph>, matching one of the 
allowed variations of a
+        <codeph>TIMESTAMP</codeph> value (date plus time, only date, only 
time, optional fractional seconds).
+      </li>
+
+      <li>
+        See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> 
for details about differences in
+        <codeph>TIMESTAMP</codeph> handling between Impala and Hive.
+      </li>
+    </ul>
+
+  </conbody>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_tinyint.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tinyint.xml b/docs/topics/impala_tinyint.xml
new file mode 100644
index 0000000..2756cb8
--- /dev/null
+++ b/docs/topics/impala_tinyint.xml
@@ -0,0 +1,103 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="tinyint">
+
+  <title>TINYINT Data Type</title>
+  <titlealts audience="PDF"><navtitle>TINYINT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      A 1-byte integer data type used in <codeph>CREATE TABLE</codeph> and 
<codeph>ALTER TABLE</codeph> statements.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+    </p>
+
+<codeblock><varname>column_name</varname> TINYINT</codeblock>
+
+    <p>
+      <b>Range:</b> -128 .. 127. There is no <codeph>UNSIGNED</codeph> subtype.
+    </p>
+
+    <p>
+      <b>Conversions:</b> Impala automatically converts to a larger integer 
type (<codeph>SMALLINT</codeph>,
+      <codeph>INT</codeph>, or <codeph>BIGINT</codeph>) or a floating-point 
type (<codeph>FLOAT</codeph> or
+      <codeph>DOUBLE</codeph>) automatically. Use <codeph>CAST()</codeph> to 
convert to <codeph>STRING</codeph> or
+      <codeph>TIMESTAMP</codeph>.
+      <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/int_overflow_behavior"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      For a convenient and automated way to check the bounds of the 
<codeph>TINYINT</codeph> type, call the
+      functions <codeph>MIN_TINYINT()</codeph> and 
<codeph>MAX_TINYINT()</codeph>.
+    </p>
+
+    <p>
+      If an integer value is too large to be represented as a 
<codeph>TINYINT</codeph>, use a
+      <codeph>SMALLINT</codeph> instead.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>CREATE TABLE t1 (x TINYINT);
+SELECT CAST(100 AS TINYINT);
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/parquet_blurb"/>
+
+<!-- Duplicated under TINYINT and SMALLINT. Turn into a conref in both places. 
-->
+
+    <p rev="1.4.0">
+      Physically, Parquet files represent <codeph>TINYINT</codeph> and 
<codeph>SMALLINT</codeph> values as 32-bit
+      integers. Although Impala rejects attempts to insert out-of-range values 
into such columns, if you create a
+      new table with the <codeph>CREATE TABLE ... LIKE PARQUET</codeph> 
syntax, any <codeph>TINYINT</codeph> or
+      <codeph>SMALLINT</codeph> columns in the original table turn into 
<codeph>INT</codeph> columns in the new
+      table.
+    </p>
+
+<!-- <p conref="../shared/impala_common.xml#common/partitioning_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!-- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/internals_1_bytes"/>
+
+    <p conref="../shared/impala_common.xml#common/added_forever"/>
+
+    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+<!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_literals.xml#numeric_literals"/>, <xref 
href="impala_tinyint.xml#tinyint"/>,
+      <xref href="impala_smallint.xml#smallint"/>, <xref 
href="impala_int.xml#int"/>,
+      <xref href="impala_bigint.xml#bigint"/>, <xref 
href="impala_decimal.xml#decimal"/>,
+      <xref href="impala_math_functions.xml#math_functions"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_trouble_bad_results.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_trouble_bad_results.xml 
b/docs/topics/impala_trouble_bad_results.xml
new file mode 100644
index 0000000..dc15c5f
--- /dev/null
+++ b/docs/topics/impala_trouble_bad_results.xml
@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="trouble_bad_results">
+
+  <title>Troubleshooting Incorrect Results</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">incorrect results</indexterm>
+      <indexterm audience="Cloudera">bad results</indexterm>
+      <indexterm audience="Cloudera">wrong results</indexterm>
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_trouble_memory.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_trouble_memory.xml 
b/docs/topics/impala_trouble_memory.xml
new file mode 100644
index 0000000..ae6ab2a
--- /dev/null
+++ b/docs/topics/impala_trouble_memory.xml
@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="trouble_memory">
+
+  <title>Troubleshooting Out-of-Memory Issues</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Memory"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">memory errors</indexterm>
+      <indexterm audience="Cloudera">out-of-memory errors</indexterm>
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_trouble_query_fail.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_trouble_query_fail.xml 
b/docs/topics/impala_trouble_query_fail.xml
new file mode 100644
index 0000000..6b1d1ee
--- /dev/null
+++ b/docs/topics/impala_trouble_query_fail.xml
@@ -0,0 +1,24 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="trouble_query_fail">
+
+  <title>Troubleshooting Query Errors</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">query errors</indexterm>
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_trouble_sql.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_trouble_sql.xml 
b/docs/topics/impala_trouble_sql.xml
new file mode 100644
index 0000000..25d79f9
--- /dev/null
+++ b/docs/topics/impala_trouble_sql.xml
@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="trouble_sql">
+
+  <title>Troubleshooting SQL Syntax Errors</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SQL errors</indexterm>
+      <indexterm audience="Cloudera">syntax errors</indexterm>
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_trouble_startup.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_trouble_startup.xml 
b/docs/topics/impala_trouble_startup.xml
new file mode 100644
index 0000000..ff4b49e
--- /dev/null
+++ b/docs/topics/impala_trouble_startup.xml
@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="trouble_startup">
+
+  <title>Troubleshooting Startup Problems</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Starting and Stopping"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">startup errors</indexterm>
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_troubleshooting.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_troubleshooting.xml 
b/docs/topics/impala_troubleshooting.xml
new file mode 100644
index 0000000..f7ebe4e
--- /dev/null
+++ b/docs/topics/impala_troubleshooting.xml
@@ -0,0 +1,447 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="troubleshooting">
+
+  <title>Troubleshooting Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">troubleshooting</indexterm>
+      Troubleshooting for Impala requires being able to diagnose and debug 
problems
+      with performance, network connectivity, out-of-memory conditions, disk 
space usage,
+      and crash or hang conditions in any of the Impala-related daemons.
+    </p>
+
+    <p outputclass="toc inpage" audience="PDF">
+      The following sections describe the general troubleshooting procedures 
to diagnose
+      different kinds of problems:
+    </p>
+
+  </conbody>
+
+  <concept id="trouble_sql">
+
+    <title>Troubleshooting Impala SQL Syntax Issues</title>
+
+    <conbody>
+
+      <p>
+        In general, if queries issued against Impala fail, you can try running 
these same queries against Hive.
+      </p>
+
+      <ul>
+        <li>
+          If a query fails against both Impala and Hive, it is likely that 
there is a problem with your query or
+          other elements of your CDH environment:
+          <ul>
+            <li>
+              Review the <xref href="impala_langref.xml#langref">Language 
Reference</xref> to ensure your query is
+              valid.
+            </li>
+
+            <li>
+              Check <xref href="impala_reserved_words.xml#reserved_words"/> to 
see if any database, table,
+              column, or other object names in your query conflict with Impala 
reserved words.
+              Quote those names with backticks (<codeph>``</codeph>) if so.
+            </li>
+
+            <li>
+              Check <xref href="impala_functions.xml#builtins"/> to confirm 
whether Impala supports all the
+              built-in functions being used by your query, and whether 
argument and return types are the
+              same as you expect.
+            </li>
+
+            <li>
+              Review the <xref href="impala_logging.xml#logs_debug">contents 
of the Impala logs</xref> for any information that may be useful in identifying 
the
+              source of the problem.
+            </li>
+          </ul>
+        </li>
+
+        <li>
+          If a query fails against Impala but not Hive, it is likely that 
there is a problem with your Impala
+          installation.
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="trouble_io" rev="CDH-19201">
+    <title>Troubleshooting I/O Capacity Problems</title>
+    <conbody>
+      <p>
+        Impala queries are typically I/O-intensive. If there is an I/O problem 
with storage devices,
+        or with HDFS itself, Impala queries could show slow response times 
with no obvious cause
+        on the Impala side. Slow I/O on even a single DataNode could result in 
an overall slowdown, because
+        queries involving clauses such as <codeph>ORDER BY</codeph>, 
<codeph>GROUP BY</codeph>, or <codeph>JOIN</codeph>
+        do not start returning results until all DataNodes have finished their 
work.
+      </p>
+      <p>
+        To test whether the Linux I/O system itself is performing as expected, 
run Linux commands like
+        the following on each DataNode:
+      </p>
+<codeblock>
+$ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
+vm.drop_caches = 3
+vm.drop_caches = 0
+$ sudo dd if=/dev/sda bs=1M of=/dev/null count=1k 
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.60373 s, 192 MB/s
+$ sudo dd if=/dev/sdb bs=1M of=/dev/null count=1k
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.51145 s, 195 MB/s
+$ sudo dd if=/dev/sdc bs=1M of=/dev/null count=1k
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.58096 s, 192 MB/s
+$ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.43924 s, 197 MB/s
+</codeblock>
+      <p>
+        On modern hardware, a throughput rate of less than 100 MB/s typically 
indicates
+        a performance issue with the storage device. Correct the hardware 
problem before
+        continuing with Impala tuning or benchmarking.
+      </p>
+    </conbody>
+  </concept>
+
+
+  <concept id="trouble_cookbook">
+
+    <title>Impala Troubleshooting Quick Reference</title>
+
+    <conbody>
+
+      <p>
+        The following table lists common problems and potential solutions.
+      </p>
+
+      <table>
+        <tgroup cols="3">
+          <colspec colname="1" colwidth="10*"/>
+          <colspec colname="2" colwidth="30*"/>
+          <colspec colname="3" colwidth="30*"/>
+          <thead>
+            <row>
+              <entry>
+                Symptom
+              </entry>
+              <entry>
+                Explanation
+              </entry>
+              <entry>
+                Recommendation
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                Impala takes a long time to start.
+              </entry>
+              <entry>
+                Impala instances with large numbers of tables, partitions, or 
data files take longer to start
+                because the metadata for these objects is broadcast to all 
<cmdname>impalad</cmdname> nodes and
+                cached.
+              </entry>
+              <entry>
+                Adjust timeout and synchronicity settings.
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <p>
+                  Joins fail to complete.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  There may be insufficient memory. During a join, data from 
the second, third, and so on sets to
+                  be joined is loaded into memory. If Impala chooses an 
inefficient join order or join mechanism,
+                  the query could exceed the total memory available.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Start by gathering statistics with the <codeph>COMPUTE 
STATS</codeph> statement for each table
+                  involved in the join. Consider specifying the 
<codeph>[SHUFFLE]</codeph> hint so that data from
+                  the joined tables is split up between nodes rather than 
broadcast to each node. If tuning at the
+                  SQL level is not sufficient, add more memory to your system 
or join smaller data sets.
+                </p>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <p>
+                  Queries return incorrect results.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Impala metadata may be outdated after changes are performed 
in Hive.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Where possible, use the appropriate Impala statement 
(<codeph>INSERT</codeph>, <codeph>LOAD
+                  DATA</codeph>, <codeph>CREATE TABLE</codeph>, <codeph>ALTER 
TABLE</codeph>, <codeph>COMPUTE
+                  STATS</codeph>, and so on) rather than switching back and 
forth between Impala and Hive. Impala
+                  automatically broadcasts the results of DDL and DML 
operations to all Impala nodes in the
+                  cluster, but does not automatically recognize when such 
changes are made through Hive. After
+                  inserting data, adding a partition, or other operation in 
Hive, refresh the metadata for the
+                  table as described in <xref 
href="impala_refresh.xml#refresh"/>.
+                </p>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <p>
+                  Queries are slow to return results.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Some <codeph>impalad</codeph> instances may not have 
started. Using a browser, connect to the
+                  host running the Impala state store. Connect using an 
address of the form
+                  
<codeph>http://<varname>hostname</varname>:<varname>port</varname>/metrics</codeph>.
+                </p>
+
+                <p>
+                  <note>
+                    Replace <varname>hostname</varname> and 
<varname>port</varname> with the hostname and port of
+                    your Impala state store host machine and web server port. 
The default port is 25010.
+                  </note>
+                  The number of <codeph>impalad</codeph> instances listed 
should match the expected number of
+                  <codeph>impalad</codeph> instances installed in the cluster. 
There should also be one
+                  <codeph>impalad</codeph> instance installed on each DataNode
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Ensure Impala is installed on all DataNodes. Start any 
<codeph>impalad</codeph> instances that
+                  are not running.
+                </p>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <p>
+                  Queries are slow to return results.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Impala may not be configured to use native checksumming. 
Native checksumming uses
+                  machine-specific instructions to compute checksums over HDFS 
data very quickly. Review Impala
+                  logs. If you find instances of "<codeph>INFO 
util.NativeCodeLoader: Loaded the
+                  native-hadoop</codeph>" messages, native checksumming is not 
enabled.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Ensure Impala is configured to use native checksumming as 
described in
+                  <xref 
href="impala_config_performance.xml#config_performance"/>.
+                </p>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <p>
+                  Queries are slow to return results.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Impala may not be configured to use data locality tracking.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Test Impala for data locality tracking and make 
configuration changes as necessary. Information
+                  on this process can be found in <xref 
href="impala_config_performance.xml#config_performance"/>.
+                </p>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <p>
+                  Attempts to complete Impala tasks such as executing 
INSERT-SELECT actions fail. The Impala logs
+                  include notes that files could not be opened due to 
permission denied.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  This can be the result of permissions issues. For example, 
you could use the Hive shell as the
+                  hive user to create a table. After creating this table, you 
could attempt to complete some
+                  action, such as an INSERT-SELECT on the table. Because the 
table was created using one user and
+                  the INSERT-SELECT is attempted by another, this action may 
fail due to permissions issues.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  In general, ensure the Impala user has sufficient 
permissions. In the preceding example, ensure
+                  the Impala user has sufficient permissions to the table that 
the Hive user created.
+                </p>
+              </entry>
+            </row>
+            <row rev="IMP-1210">
+              <entry>
+                <p>
+                  Impala fails to start up, with the 
<cmdname>impalad</cmdname> logs referring to errors connecting
+                  to the statestore service and attempts to re-register.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  A large number of databases, tables, partitions, and so on 
can require metadata synchronization,
+                  particularly on startup, that takes longer than the default 
timeout for the statestore service.
+                </p>
+              </entry>
+              <entry>
+                <p>
+                  Configure the statestore timeout value and possibly other 
settings related to the frequency of
+                  statestore updates and metadata loading. See
+                  <xref href="impala_timeouts.xml#statestore_timeout"/> and
+                  <xref href="impala_scalability.xml#statestore_scalability"/>.
+                </p>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+      <p audience="Cloudera">
+        Some or all of these settings might also be useful.
+<codeblock>NUM_SCANNER_THREADS: 0
+ABORT_ON_DEFAULT_LIMIT_EXCEEDED: 0
+MAX_IO_BUFFERS: 0
+DEFAULT_ORDER_BY_LIMIT: -1
+BATCH_SIZE: 0
+NUM_NODES: 0
+DISABLE_CODEGEN: 0
+MAX_ERRORS: 0
+ABORT_ON_ERROR: 0
+MAX_SCAN_RANGE_LENGTH: 0
+ALLOW_UNSUPPORTED_FORMATS: 0
+SUPPORT_START_OVER: false
+DEBUG_ACTION:
+MEM_LIMIT: 0
+</codeblock>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept audience="Cloudera" id="core_dumps">
+
+    <title>Enabling Core Dumps for Impala</title>
+
+    <conbody>
+
+        <p>
+          Fill in details, then unhide.
+        </p>
+
+        <p>
+          From Nong:
+        </p>
+
+        <p>
+          In a CM-managed cluster, search for "core" from the impala 
configuration page. You should see the "enable
+          core dump" config.
+        </p>
+
+        <p>
+          From <xref href="impala_config_options.xml#config_options"/>:
+        </p>
+
+<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
+
+        <note 
conref="../shared/impala_common.xml#common/core_dump_considerations"/>
+
+      <p></p>
+    </conbody>
+  </concept>
+
+  <concept audience="Cloudera" id="io_throughput">
+    <title>Verifying I/O Throughput</title>
+    <conbody>
+      <p>
+        Optimal Impala query performance depends on being able to perform I/O 
across multiple storage devices
+        in parallel, with the data transferred at or close to the maximum 
throughput for each device.
+        If a hardware or configuration issue causes a reduction in I/O 
throughput, even if the problem only
+        affects a subset of storage devices, you might experience
+        slow query performance that cannot be improved by using regular SQL 
tuning techniques.
+      </p>
+      <p>
+        As a general guideline, expect each commodity storage device (for 
example, a standard rotational
+        hard drive) to be able to transfer approximately 100 MB per second. If 
you see persistent slow query
+        perormance, examine the Impala logs to check
+      </p>
+
+<codeblock>
+<![CDATA[
+Useful test from beta at Visa.
+SME: Jayant@
+
+Symptoms:
+* Queries running slow
+* Scan rate of IO in Impala logs show noticeably less than expected IO rate 
for each disk (typical commodity disk should provide ~100 MB/s
+
+Actions:
+* Validate disk read from OS to confirm no issue at hardware or OS level
+* Validate disk read at HDFS to see if issue at HDFS config
+
+Specifics:
+Testing Linux and hardware IO:
+# First running:
+sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
+
+# Then Running:
+sudo dd if=/dev/sda bs=1M of=/dev/null count=1k
+& sudo dd if=/dev/sdb bs=1M of=/dev/null count=1k
+& sudo dd if=/dev/sdc bs=1M of=/dev/null count=1k
+& sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k & wait
+
+Testing HDFS IO:
+# You can use TestDFSIO. Its documented here ; 
http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/
+# You can also use sar, dd and iostat for monitoring the disk.
+
+# writes 10 files each of 1000 MB
+hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 
-fileSize 1000
+
+# run the read benchmark
+hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -read -nrFiles 10 
-fileSize 1000
+
+# clean up the data
+hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -clean
+]]>
+</codeblock>
+
+    </conbody>
+  </concept>
+
+  <concept id="webui_snippet" audience="PDF">
+    <title conref="impala_webui.xml#webui/webui_title"/>
+    <conbody>
+      <p conref="impala_webui.xml#webui/webui_intro"/>
+      <p>
+        For full details, see <xref href="impala_webui.xml#webui"/>.
+      </p>
+    </conbody>
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_truncate_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_truncate_table.xml 
b/docs/topics/impala_truncate_table.xml
new file mode 100644
index 0000000..72f072f
--- /dev/null
+++ b/docs/topics/impala_truncate_table.xml
@@ -0,0 +1,184 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept rev="2.3.0" id="truncate_table">
+
+  <title>TRUNCATE TABLE Statement (<keyword keyref="impala23"/> or higher 
only)</title>
+  <titlealts audience="PDF"><navtitle>TRUNCATE TABLE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="S3"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Disk Storage"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.3.0">
+      <indexterm audience="Cloudera">TRUNCATE TABLE statement</indexterm>
+      Removes the data from an Impala table while leaving the table itself.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>TRUNCATE TABLE <ph rev="2.5.0 IMPALA-2641">[IF EXISTS]</ph> 
[<varname>db_name</varname>.]<varname>table_name</varname></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Often used to empty tables that are used during ETL cycles, after the 
data has been copied to another
+      table for the next stage of processing. This statement is a low-overhead 
alternative to dropping and
+      recreating the table, or using <codeph>INSERT OVERWRITE</codeph> to 
replace the data during the
+      next ETL cycle.
+    </p>
+
+    <p>
+      This statement removes all the data and associated data files in the 
table. It can remove data files from internal tables,
+      external tables, partitioned tables, and tables mapped to HBase or the 
Amazon Simple Storage Service (S3).
+      The data removal applies to the entire table, including all partitions 
of a partitioned table.
+    </p>
+
+    <p>
+      Any statistics produced by the <codeph>COMPUTE STATS</codeph> statement 
are reset when the data is removed.
+    </p>
+
+    <p>
+      Make sure that you are in the correct database before truncating a 
table, either by issuing a
+      <codeph>USE</codeph> statement first or by using a fully qualified name
+      
<codeph><varname>db_name</varname>.<varname>table_name</varname></codeph>.
+    </p>
+
+    <p rev="2.5.0 IMPALA-2641">
+      The optional <codeph>IF EXISTS</codeph> clause makes the statement 
succeed whether or not the table exists.
+      If the table does exist, it is truncated; if it does not exist, the 
statement has no effect. This capability is
+      useful in standardized setup scripts that are might be run both before 
and after some of the tables exist.
+      This clause is available in <keyword keyref="impala25_full"/> and higher.
+    </p>
+
+    <p>
+      Any HDFS data files removed by this statement go into the HDFS trashcan, 
from which you can recover them
+      within a defined time interval if this operation turns out to be a 
mistake.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/disk_space_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p rev="2.2.0">
+      Although Impala cannot write new data to a table stored in the Amazon
+      S3 filesystem, the <codeph>TRUNCATE TABLE</codeph> statement can remove 
data files from S3.
+      See <xref href="impala_s3.xml#s3"/> for details about working with S3 
tables.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have write
+      permission for all the files and directories that make up the table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows a table containing some data and with table 
and column statistics.
+      After the <codeph>TRUNCATE TABLE</codeph> statement, the data is removed 
and the statistics
+      are reset.
+    </p>
+
+<codeblock>CREATE TABLE truncate_demo (x INT);
+INSERT INTO truncate_demo VALUES (1), (2), (4), (8);
+SELECT COUNT(*) FROM truncate_demo;
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+COMPUTE STATS truncate_demo;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+SHOW TABLE STATS truncate_demo;
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
Incremental stats |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| 4     | 1      | 8B   | NOT CACHED   | NOT CACHED        | TEXT   | false    
         |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+SHOW COLUMN STATS truncate_demo;
++--------+------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+------+------------------+--------+----------+----------+
+| x      | INT  | 4                | -1     | 4        | 4        |
++--------+------+------------------+--------+----------+----------+
+
+-- After this statement, the data and the table/column stats will be gone.
+TRUNCATE TABLE truncate_demo;
+
+SELECT COUNT(*) FROM truncate_demo;
++----------+
+| count(*) |
++----------+
+| 0        |
++----------+
+SHOW TABLE STATS truncate_demo;
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
Incremental stats |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT   | false    
         |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+SHOW COLUMN STATS truncate_demo;
++--------+------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+------+------------------+--------+----------+----------+
+| x      | INT  | -1               | -1     | 4        | 4        |
++--------+------+------------------+--------+----------+----------+
+</codeblock>
+
+    <p>
+      The following example shows how the <codeph>IF EXISTS</codeph> clause 
allows the <codeph>TRUNCATE TABLE</codeph>
+      statement to be run without error whether or not the table exists:
+    </p>
+
+<codeblock rev="2.5.0 IMPALA-2641">CREATE TABLE staging_table1 (x INT, s 
STRING);
+Fetched 0 row(s) in 0.33s
+
+SHOW TABLES LIKE 'staging*';
++----------------+
+| name           |
++----------------+
+| staging_table1 |
++----------------+
+Fetched 1 row(s) in 0.25s
+
+-- Our ETL process involves removing all data from several staging tables
+-- even though some might be already dropped, or not created yet.
+
+TRUNCATE TABLE IF EXISTS staging_table1;
+Fetched 0 row(s) in 5.04s
+
+TRUNCATE TABLE IF EXISTS staging_table2;
+Fetched 0 row(s) in 0.25s
+
+TRUNCATE TABLE IF EXISTS staging_table3;
+Fetched 0 row(s) in 0.25s
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_tables.xml#tables"/>,
+      <xref href="impala_alter_table.xml#alter_table"/>, <xref 
href="impala_create_table.xml#create_table"/>,
+      <xref href="impala_partitioning.xml#partitioning"/>, <xref 
href="impala_tables.xml#internal_tables"/>,
+      <xref href="impala_tables.xml#external_tables"/>
+    </p>
+
+  </conbody>
+</concept>

[04/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to