[36/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

jbapple Wed, 12 Apr 2017 11:25:56 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_describe.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_describe.html 
b/docs/build/html/topics/impala_describe.html
new file mode 100644
index 0000000..963ef6e
--- /dev/null
+++ b/docs/build/html/topics/impala_describe.html
@@ -0,0 +1,802 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_langref_sql.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="describe"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>DESCRIBE Statement</title></head><body 
id="describe"><main role="main"><article role="article" 
aria-labelledby="describe__desc">
+
+  <h1 class="title topictitle1" id="describe__desc">DESCRIBE Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">DESCRIBE</code> statement displays metadata 
about a table, such as the column names and their
+      data types.
+      <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, 
you can specify the name of a complex type column, which takes
+      the form of a dotted path. The path might include multiple components in 
the case of a nested type definition.</span>
+      <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, 
the <code class="ph codeph">DESCRIBE DATABASE</code> form can display
+      information about a database.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DESCRIBE [DATABASE] [FORMATTED|EXTENDED] <var 
class="keyword varname">object_name</var>
+
+object_name ::=
+    [<var class="keyword varname">db_name</var>.]<var class="keyword 
varname">table_name</var>[.<var class="keyword varname">complex_col_name</var> 
...]
+  | <var class="keyword varname">db_name</var>
+</code></pre>
+
+    <p class="p">
+      You can use the abbreviation <code class="ph codeph">DESC</code> for the 
<code class="ph codeph">DESCRIBE</code> statement.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">DESCRIBE FORMATTED</code> variation displays 
additional information, in a format familiar to
+      users of Apache Hive. The extra information includes low-level details 
such as whether the table is internal
+      or external, when it was created, the file format, the location of the 
data in HDFS, whether the object is a
+      table or a view, and (for views) the text of the query from the view 
definition.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      The <code class="ph codeph">Compressed</code> field is not a reliable 
indicator of whether the table contains compressed
+      data. It typically always shows <code class="ph codeph">No</code>, 
because the compression settings only apply during the
+      session that loads data and are not stored persistently with the table 
metadata.
+    </div>
+
+<p class="p">
+  <strong class="ph b">Describing databases:</strong>
+</p>
+
+<p class="p">
+  By default, the <code class="ph codeph">DESCRIBE</code> output for a 
database includes the location
+  and the comment, which can be set by the <code class="ph 
codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>
+  clauses on the <code class="ph codeph">CREATE DATABASE</code> statement.
+</p>
+
+<p class="p">
+  The additional information displayed by the <code class="ph 
codeph">FORMATTED</code> or <code class="ph codeph">EXTENDED</code>
+  keyword includes the HDFS user ID that is considered the owner of the 
database, and any
+  optional database properties. The properties could be specified by the <code 
class="ph codeph">WITH DBPROPERTIES</code>
+  clause if the database is created using a Hive <code class="ph 
codeph">CREATE DATABASE</code> statement.
+  Impala currently does not set or do any special processing based on those 
properties.
+</p>
+
+<p class="p">
+The following examples show the variations in syntax and output for
+describing databases. This feature is available in <span 
class="keyword">Impala 2.5</span>
+and higher.
+</p>
+
+<pre class="pre codeblock"><code>
+describe database default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
++---------+----------------------+-----------------------+
+
+describe database formatted default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner:  |                      |                       |
+|         | public               | ROLE                  |
++---------+----------------------+-----------------------+
+
+describe database extended default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner:  |                      |                       |
+|         | public               | ROLE                  |
++---------+----------------------+-----------------------+
+</code></pre>
+
+<p class="p">
+  <strong class="ph b">Describing tables:</strong>
+</p>
+
+<p class="p">
+  If the <code class="ph codeph">DATABASE</code> keyword is omitted, the 
default
+  for the <code class="ph codeph">DESCRIBE</code> statement is to refer to a 
table.
+</p>
+
+<pre class="pre codeblock"><code>
+-- By default, the table is assumed to be in the current database.
+describe my_table;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| s    | string |         |
++------+--------+---------+
+
+-- Use a fully qualified table name to specify a table in any database.
+describe my_database.my_table;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| s    | string |         |
++------+--------+---------+
+
+-- The formatted or extended output includes additional useful information.
+-- The LOCATION field is especially useful to know for DDL statements and HDFS 
commands
+-- during ETL jobs. (The LOCATION includes a full hdfs:// URL, omitted here 
for readability.)
+describe formatted my_table;
++------------------------------+----------------------------------------------+----------------------+
+| name                         | type                                         
| comment              |
++------------------------------+----------------------------------------------+----------------------+
+| # col_name                   | data_type                                    
| comment              |
+|                              | NULL                                         
| NULL                 |
+| x                            | int                                          
| NULL                 |
+| s                            | string                                       
| NULL                 |
+|                              | NULL                                         
| NULL                 |
+| # Detailed Table Information | NULL                                         
| NULL                 |
+| Database:                    | my_database                                  
| NULL                 |
+| Owner:                       | jrussell                                     
| NULL                 |
+| CreateTime:                  | Fri Mar 18 15:58:00 PDT 2016                 
| NULL                 |
+| LastAccessTime:              | UNKNOWN                                      
| NULL                 |
+| Protect Mode:                | None                                         
| NULL                 |
+| Retention:                   | 0                                            
| NULL                 |
+| Location:                    | /user/hive/warehouse/my_database.db/my_table 
| NULL                 |
+| Table Type:                  | MANAGED_TABLE                                
| NULL                 |
+| Table Parameters:            | NULL                                         
| NULL                 |
+|                              | transient_lastDdlTime                        
| 1458341880           |
+|                              | NULL                                         
| NULL                 |
+| # Storage Information        | NULL                                         
| NULL                 |
+| SerDe Library:               | org. ... .LazySimpleSerDe                    
| NULL                 |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat     
| NULL                 |
+| OutputFormat:                | org. ... .HiveIgnoreKeyTextOutputFormat      
| NULL                 |
+| Compressed:                  | No                                           
| NULL                 |
+| Num Buckets:                 | 0                                            
| NULL                 |
+| Bucket Columns:              | []                                           
| NULL                 |
+| Sort Columns:                | []                                           
| NULL                 |
++------------------------------+----------------------------------------------+----------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because the column definitions for complex types can become long, 
particularly when such types are nested,
+      the <code class="ph codeph">DESCRIBE</code> statement uses special 
formatting for complex type columns to make the output readable.
+    </p>
+
+    <p class="p">
+      For the <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types available in
+      <span class="keyword">Impala 2.3</span> and higher, the <code class="ph 
codeph">DESCRIBE</code> output is formatted to avoid
+      excessively long lines for multiple fields within a <code class="ph 
codeph">STRUCT</code>, or a nested sequence of
+      complex types.
+    </p>
+
+    <p class="p">
+        You can pass a multi-part qualified name to <code class="ph 
codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an 
<code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code 
class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph 
codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code 
class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE 
t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column 
table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph 
codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each 
field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, 
with
+        <code class="ph codeph">KEY</code> and <code class="ph 
codeph">VALUE</code> columns.
+      </p>
+
+    <p class="p">
+      For example, here is the <code class="ph codeph">DESCRIBE</code> output 
for a table containing a single top-level column
+      of each complex type:
+    </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, a array&lt;int&gt;, s 
struct&lt;f1: string, f2: bigint&gt;, m map&lt;string,int&gt;) stored as 
parquet;
+
+describe t1;
++------+-----------------+---------+
+| name | type            | comment |
++------+-----------------+---------+
+| x    | int             |         |
+| a    | array&lt;int&gt;      |         |
+| s    | struct&lt;         |         |
+|      |   f1:string,    |         |
+|      |   f2:bigint     |         |
+|      | &gt;               |         |
+| m    | map&lt;string,int&gt; |         |
++------+-----------------+---------+
+
+</code></pre>
+
+    <p class="p">
+      Here are examples showing how to <span class="q">"drill down"</span> 
into the layouts of complex types, including
+      using multi-part names to examine the definitions of nested types.
+      The <code class="ph codeph">&lt; &gt;</code> delimiters identify the 
columns with complex types;
+      these are the columns where you can descend another level to see the 
parts that make up
+      the complex type.
+      This technique helps you to understand the multi-part names you use as 
table references in queries
+      involving complex types, and the corresponding column names you refer to 
in the <code class="ph codeph">SELECT</code> list.
+      These tables are from the <span class="q">"nested TPC-H"</span> schema, 
shown in detail in
+      <a class="xref" 
href="impala_complex_types.html#complex_sample_schema">Sample Schema and Data 
for Experimenting with Impala Complex Types</a>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">REGION</code> table contains an <code 
class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+      elements:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          The first <code class="ph codeph">DESCRIBE</code> specifies the 
table name, to display the definition
+          of each top-level column.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The second <code class="ph codeph">DESCRIBE</code> specifies the 
name of a complex
+          column, <code class="ph codeph">REGION.R_NATIONS</code>, showing 
that when you include the name of an <code class="ph codeph">ARRAY</code>
+          column in a <code class="ph codeph">FROM</code> clause, that table 
reference acts like a two-column table with
+          columns <code class="ph codeph">ITEM</code> and <code class="ph 
codeph">POS</code>.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The final <code class="ph codeph">DESCRIBE</code> specifies the 
fully qualified name of the <code class="ph codeph">ITEM</code> field,
+          to display the layout of its underlying <code class="ph 
codeph">STRUCT</code> type in table format, with the fields
+          mapped to column names.
+        </p>
+      </li>
+    </ul>
+
+<pre class="pre codeblock"><code>
+-- #1: The overall layout of the entire table.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- #2: The ARRAY column within the table.
+describe region.r_nations;
++------+-------------------------+---------+
+| name | type                    | comment |
++------+-------------------------+---------+
+| item | struct&lt;                 |         |
+|      |   n_nationkey:smallint, |         |
+|      |   n_name:string,        |         |
+|      |   n_comment:string      |         |
+|      | &gt;                       |         |
+| pos  | bigint                  |         |
++------+-------------------------+---------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe region.r_nations.item;
++-------------+----------+---------+
+| name        | type     | comment |
++-------------+----------+---------+
+| n_nationkey | smallint |         |
+| n_name      | string   |         |
+| n_comment   | string   |         |
++-------------+----------+---------+
+
+</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">CUSTOMER</code> table contains an <code 
class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+      elements, where one field in the <code class="ph codeph">STRUCT</code> 
is another <code class="ph codeph">ARRAY</code> of
+      <code class="ph codeph">STRUCT</code> elements:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Again, the initial <code class="ph codeph">DESCRIBE</code> specifies 
only the table name.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The second <code class="ph codeph">DESCRIBE</code> specifies the 
qualified name of the complex
+          column, <code class="ph codeph">CUSTOMER.C_ORDERS</code>, showing 
how an <code class="ph codeph">ARRAY</code>
+          is represented as a two-column table with columns <code class="ph 
codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The third <code class="ph codeph">DESCRIBE</code> specifies the 
qualified name of the <code class="ph codeph">ITEM</code>
+          of the <code class="ph codeph">ARRAY</code> column, to see the 
structure of the nested <code class="ph codeph">ARRAY</code>.
+          Again, it has has two parts, <code class="ph codeph">ITEM</code> and 
<code class="ph codeph">POS</code>. Because the
+          <code class="ph codeph">ARRAY</code> contains a <code class="ph 
codeph">STRUCT</code>, the layout of the <code class="ph codeph">STRUCT</code>
+          is shown.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The fourth and fifth <code class="ph codeph">DESCRIBE</code> 
statements drill down into a <code class="ph codeph">STRUCT</code> field that
+          is itself a complex type, an <code class="ph codeph">ARRAY</code> of 
<code class="ph codeph">STRUCT</code>.
+          The <code class="ph codeph">ITEM</code> portion of the qualified 
name is only required when the <code class="ph codeph">ARRAY</code>
+          elements are anonymous. The fields of the <code class="ph 
codeph">STRUCT</code> give names to any other complex types
+          nested inside the <code class="ph codeph">STRUCT</code>. Therefore, 
the <code class="ph codeph">DESCRIBE</code> parameters
+          <code class="ph codeph">CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</code> 
and <code class="ph codeph">CUSTOMER.C_ORDERS.O_LINEITEMS</code>
+          are equivalent. (For brevity, leave out the <code class="ph 
codeph">ITEM</code> portion of
+          a qualified name when it is not required.)
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The final <code class="ph codeph">DESCRIBE</code> shows the layout 
of the deeply nested <code class="ph codeph">STRUCT</code> type.
+          Because there are no more complex types nested inside this <code 
class="ph codeph">STRUCT</code>, this is as far
+          as you can drill down into the layout for this table.
+        </p>
+      </li>
+    </ul>
+
+<pre class="pre codeblock"><code>-- #1: The overall layout of the entire table.
+describe customer;
++--------------+------------------------------------+
+| name         | type                               |
++--------------+------------------------------------+
+| c_custkey    | bigint                             |
+... more scalar columns ...
+| c_orders     | array&lt;struct&lt;                      |
+|              |   o_orderkey:bigint,               |
+|              |   o_orderstatus:string,            |
+|              |   o_totalprice:decimal(12,2),      |
+|              |   o_orderdate:string,              |
+|              |   o_orderpriority:string,          |
+|              |   o_clerk:string,                  |
+|              |   o_shippriority:int,              |
+|              |   o_comment:string,                |
+|              |   o_lineitems:array&lt;struct&lt;        |
+|              |     l_partkey:bigint,              |
+|              |     l_suppkey:bigint,              |
+|              |     l_linenumber:int,              |
+|              |     l_quantity:decimal(12,2),      |
+|              |     l_extendedprice:decimal(12,2), |
+|              |     l_discount:decimal(12,2),      |
+|              |     l_tax:decimal(12,2),           |
+|              |     l_returnflag:string,           |
+|              |     l_linestatus:string,           |
+|              |     l_shipdate:string,             |
+|              |     l_commitdate:string,           |
+|              |     l_receiptdate:string,          |
+|              |     l_shipinstruct:string,         |
+|              |     l_shipmode:string,             |
+|              |     l_comment:string               |
+|              |   &gt;&gt;                               |
+|              | &gt;&gt;                                 |
++--------------+------------------------------------+
+
+-- #2: The ARRAY column within the table.
+describe customer.c_orders;
++------+------------------------------------+
+| name | type                               |
++------+------------------------------------+
+| item | struct&lt;                            |
+|      |   o_orderkey:bigint,               |
+|      |   o_orderstatus:string,            |
+... more struct fields ...
+|      |   o_lineitems:array&lt;struct&lt;        |
+|      |     l_partkey:bigint,              |
+|      |     l_suppkey:bigint,              |
+... more nested struct fields ...
+|      |     l_comment:string               |
+|      |   &gt;&gt;                               |
+|      | &gt;                                  |
+| pos  | bigint                             |
++------+------------------------------------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe customer.c_orders.item;
++-----------------+----------------------------------+
+| name            | type                             |
++-----------------+----------------------------------+
+| o_orderkey      | bigint                           |
+| o_orderstatus   | string                           |
+| o_totalprice    | decimal(12,2)                    |
+| o_orderdate     | string                           |
+| o_orderpriority | string                           |
+| o_clerk         | string                           |
+| o_shippriority  | int                              |
+| o_comment       | string                           |
+| o_lineitems     | array&lt;struct&lt;                    |
+|                 |   l_partkey:bigint,              |
+|                 |   l_suppkey:bigint,              |
+... more struct fields ...
+|                 |   l_comment:string               |
+|                 | &gt;&gt;                               |
++-----------------+----------------------------------+
+
+-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY.
+describe customer.c_orders.item.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct&lt;                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | &gt;                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the 
name
+--     because O_LINEITEMS and other field names provide a way to refer to 
things
+--     inside the ARRAY element.
+describe customer.c_orders.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct&lt;                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | &gt;                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #6: The STRUCT representing ARRAY elements nested inside
+--     another ARRAY of STRUCTs. The lack of any complex types
+--     in this output means this is as far as DESCRIBE can
+--     descend into the table layout.
+describe customer.c_orders.o_lineitems.item;
++-----------------+---------------+
+| name            | type          |
++-----------------+---------------+
+| l_partkey       | bigint        |
+| l_suppkey       | bigint        |
+... more scalar columns ...
+| l_comment       | string        |
++-----------------+---------------+
+
+</code></pre>
+
+<p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+<p class="p">
+  After the <span class="keyword cmdname">impalad</span> daemons are 
restarted, the first query against a table can take longer
+  than subsequent queries, because the metadata for the table is loaded before 
the query is processed. This
+  one-time delay for each table can cause misleading results in benchmark 
tests or cause unnecessary concern.
+  To <span class="q">"warm up"</span> the Impala metadata cache, you can issue 
a <code class="ph codeph">DESCRIBE</code> statement in advance
+  for each table you intend to access later.
+</p>
+
+<p class="p">
+  When you are dealing with data files stored in HDFS, sometimes it is 
important to know details such as the
+  path of the data files for an Impala table, and the hostname for the 
namenode. You can get this information
+  from the <code class="ph codeph">DESCRIBE FORMATTED</code> output. You 
specify HDFS URIs or path specifications with
+  statements such as <code class="ph codeph">LOAD DATA</code> and the <code 
class="ph codeph">LOCATION</code> clause of <code class="ph codeph">CREATE
+  TABLE</code> or <code class="ph codeph">ALTER TABLE</code>. You might also 
use HDFS URIs or paths with Linux commands
+  such as <span class="keyword cmdname">hadoop</span> and <span class="keyword 
cmdname">hdfs</span> to copy, rename, and so on, data files in HDFS.
+</p>
+
+<p class="p">
+        If you connect to different Impala nodes within an <span 
class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph 
codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has 
been received by all the Impala
+        nodes. See <a class="xref" 
href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query 
Option</a> for details.
+      </p>
+
+<p class="p">
+  Each table can also have associated table statistics and column statistics. 
To see these categories of
+  information, use the <code class="ph codeph">SHOW TABLE STATS <var 
class="keyword varname">table_name</var></code> and <code class="ph 
codeph">SHOW COLUMN
+  STATS <var class="keyword varname">table_name</var></code> statements.
+
+  See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for 
details.
+</p>
+
+<div class="note important note_important"><span class="note__title 
importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical 
queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. 
Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph 
codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive 
and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This 
technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<p class="p">
+  The following example shows the results of both a standard <code class="ph 
codeph">DESCRIBE</code> and <code class="ph codeph">DESCRIBE
+  FORMATTED</code> for different kinds of schema objects:
+</p>
+
+  <ul class="ul">
+    <li class="li">
+      <code class="ph codeph">DESCRIBE</code> for a table or a view returns 
the name, type, and comment for each of the
+      columns. For a view, if the column value is computed by an expression, 
the column name is automatically
+      generated as <code class="ph codeph">_c0</code>, <code class="ph 
codeph">_c1</code>, and so on depending on the ordinal number of the
+      column.
+    </li>
+
+    <li class="li">
+      A table created with no special format or storage clauses is designated 
as a <code class="ph codeph">MANAGED_TABLE</code>
+      (an <span class="q">"internal table"</span> in Impala terminology). Its 
data files are stored in an HDFS directory under the
+      default Hive data directory. By default, it uses Text data format.
+    </li>
+
+    <li class="li">
+      A view is designated as <code class="ph codeph">VIRTUAL_VIEW</code> in 
<code class="ph codeph">DESCRIBE FORMATTED</code> output. Some
+      of its properties are <code class="ph codeph">NULL</code> or blank 
because they are inherited from the base table. The
+      text of the query that defines the view is part of the <code class="ph 
codeph">DESCRIBE FORMATTED</code> output.
+    </li>
+
+    <li class="li">
+      A table with additional clauses in the <code class="ph codeph">CREATE 
TABLE</code> statement has differences in
+      <code class="ph codeph">DESCRIBE FORMATTED</code> output. The output for 
<code class="ph codeph">T2</code> includes the
+      <code class="ph codeph">EXTERNAL_TABLE</code> keyword because of the 
<code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, and
+      different <code class="ph codeph">InputFormat</code> and <code class="ph 
codeph">OutputFormat</code> fields to reflect the Parquet file
+      format.
+    </li>
+  </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x 
int, y int, s string);
+Query: create table t1 (x int, y int, s string)
+[localhost:21000] &gt; describe t1;
+Query: describe t1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| y    | int    |         |
+| s    | string |         |
++------+--------+---------+
+Returned 3 row(s) in 0.13s
+[localhost:21000] &gt; describe formatted t1;
+Query: describe formatted t1
+Query finished, fetching results ...
++------------------------------+--------------------------------------------+------------+
+| name                         | type                                       | 
comment    |
++------------------------------+--------------------------------------------+------------+
+| # col_name                   | data_type                                  | 
comment    |
+|                              | NULL                                       | 
NULL       |
+| x                            | int                                        | 
None       |
+| y                            | int                                        | 
None       |
+| s                            | string                                     | 
None       |
+|                              | NULL                                       | 
NULL       |
+| # Detailed Table Information | NULL                                       | 
NULL       |
+| Database:                    | describe_formatted                         | 
NULL       |
+| Owner:                       | doc_demo                                   | 
NULL       |
+| CreateTime:                  | Mon Jul 22 17:03:16 EDT 2013               | 
NULL       |
+| LastAccessTime:              | UNKNOWN                                    | 
NULL       |
+| Protect Mode:                | None                                       | 
NULL       |
+| Retention:                   | 0                                          | 
NULL       |
+| Location:                    | hdfs://127.0.0.1:8020/user/hive/warehouse/ |  
          |
+|                              |   describe_formatted.db/t1                 | 
NULL       |
+| Table Type:                  | MANAGED_TABLE                              | 
NULL       |
+| Table Parameters:            | NULL                                       | 
NULL       |
+|                              | transient_lastDdlTime                      | 
1374526996 |
+|                              | NULL                                       | 
NULL       |
+| # Storage Information        | NULL                                       | 
NULL       |
+| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.        |  
          |
+|                              |   LazySimpleSerDe                          | 
NULL       |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat   | 
NULL       |
+| OutputFormat:                | org.apache.hadoop.hive.ql.io.              |  
          |
+|                              |   HiveIgnoreKeyTextOutputFormat            | 
NULL       |
+| Compressed:                  | No                                         | 
NULL       |
+| Num Buckets:                 | 0                                          | 
NULL       |
+| Bucket Columns:              | []                                         | 
NULL       |
+| Sort Columns:                | []                                         | 
NULL       |
++------------------------------+--------------------------------------------+------------+
+Returned 26 row(s) in 0.03s
+[localhost:21000] &gt; create view v1 as select x, upper(s) from t1;
+Query: create view v1 as select x, upper(s) from t1
+[localhost:21000] &gt; describe v1;
+Query: describe v1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| _c1  | string |         |
++------+--------+---------+
+Returned 2 row(s) in 0.10s
+[localhost:21000] &gt; describe formatted v1;
+Query: describe formatted v1
+Query finished, fetching results ...
++------------------------------+------------------------------+----------------------+
+| name                         | type                         | comment        
      |
++------------------------------+------------------------------+----------------------+
+| # col_name                   | data_type                    | comment        
      |
+|                              | NULL                         | NULL           
      |
+| x                            | int                          | None           
      |
+| _c1                          | string                       | None           
      |
+|                              | NULL                         | NULL           
      |
+| # Detailed Table Information | NULL                         | NULL           
      |
+| Database:                    | describe_formatted           | NULL           
      |
+| Owner:                       | doc_demo                     | NULL           
      |
+| CreateTime:                  | Mon Jul 22 16:56:38 EDT 2013 | NULL           
      |
+| LastAccessTime:              | UNKNOWN                      | NULL           
      |
+| Protect Mode:                | None                         | NULL           
      |
+| Retention:                   | 0                            | NULL           
      |
+| Table Type:                  | VIRTUAL_VIEW                 | NULL           
      |
+| Table Parameters:            | NULL                         | NULL           
      |
+|                              | transient_lastDdlTime        | 1374526598     
      |
+|                              | NULL                         | NULL           
      |
+| # Storage Information        | NULL                         | NULL           
      |
+| SerDe Library:               | null                         | NULL           
      |
+| InputFormat:                 | null                         | NULL           
      |
+| OutputFormat:                | null                         | NULL           
      |
+| Compressed:                  | No                           | NULL           
      |
+| Num Buckets:                 | 0                            | NULL           
      |
+| Bucket Columns:              | []                           | NULL           
      |
+| Sort Columns:                | []                           | NULL           
      |
+|                              | NULL                         | NULL           
      |
+| # View Information           | NULL                         | NULL           
      |
+| View Original Text:          | SELECT x, upper(s) FROM t1   | NULL           
      |
+| View Expanded Text:          | SELECT x, upper(s) FROM t1   | NULL           
      |
++------------------------------+------------------------------+----------------------+
+Returned 28 row(s) in 0.03s
+[localhost:21000] &gt; create external table t2 (x int, y int, s string) 
stored as parquet location '/user/doc_demo/sample_data';
+[localhost:21000] &gt; describe formatted t2;
+Query: describe formatted t2
+Query finished, fetching results ...
++------------------------------+----------------------------------------------------+------------+
+| name                         | type                                          
     | comment    |
++------------------------------+----------------------------------------------------+------------+
+| # col_name                   | data_type                                     
     | comment    |
+|                              | NULL                                          
     | NULL       |
+| x                            | int                                           
     | None       |
+| y                            | int                                           
     | None       |
+| s                            | string                                        
     | None       |
+|                              | NULL                                          
     | NULL       |
+| # Detailed Table Information | NULL                                          
     | NULL       |
+| Database:                    | describe_formatted                            
     | NULL       |
+| Owner:                       | doc_demo                                      
     | NULL       |
+| CreateTime:                  | Mon Jul 22 17:01:47 EDT 2013                  
     | NULL       |
+| LastAccessTime:              | UNKNOWN                                       
     | NULL       |
+| Protect Mode:                | None                                          
     | NULL       |
+| Retention:                   | 0                                             
     | NULL       |
+| Location:                    | 
hdfs://127.0.0.1:8020/user/doc_demo/sample_data    | NULL       |
+| Table Type:                  | EXTERNAL_TABLE                                
     | NULL       |
+| Table Parameters:            | NULL                                          
     | NULL       |
+|                              | EXTERNAL                                      
     | TRUE       |
+|                              | transient_lastDdlTime                         
     | 1374526907 |
+|                              | NULL                                          
     | NULL       |
+| # Storage Information        | NULL                                          
     | NULL       |
+| SerDe Library:               | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL       |
+| InputFormat:                 | 
org.apache.impala.hive.serde.ParquetInputFormat    | NULL       |
+| OutputFormat:                | 
org.apache.impala.hive.serde.ParquetOutputFormat   | NULL       |
+| Compressed:                  | No                                            
     | NULL       |
+| Num Buckets:                 | 0                                             
     | NULL       |
+| Bucket Columns:              | []                                            
     | NULL       |
+| Sort Columns:                | []                                            
     | NULL       |
++------------------------------+----------------------------------------------------+------------+
+Returned 27 row(s) in 0.17s</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon 
runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read 
and execute
+      permissions for all directories that are part of the table.
+      (A table could span multiple different HDFS directories if it is 
partitioned.
+      The directories could be widely scattered because a partition can reside
+      in an arbitrary HDFS directory based on its <code class="ph 
codeph">LOCATION</code> attribute.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      The information displayed for Kudu tables includes the additional 
attributes
+      that are only applicable for Kudu tables:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        Whether or not the column is part of the primary key. Every Kudu table
+        has a <code class="ph codeph">true</code> value here for at least one 
column. There
+        could be multiple <code class="ph codeph">true</code> values, for 
tables with
+        composite primary keys.
+      </li>
+      <li class="li">
+        Whether or not the column is nullable. Specified by the <code 
class="ph codeph">NULL</code>
+        or <code class="ph codeph">NOT NULL</code> attributes on the <code 
class="ph codeph">CREATE TABLE</code> statement.
+        Columns that are part of the primary key are automatically 
non-nullable.
+      </li>
+      <li class="li">
+        The default value, if any, for the column. Specified by the <code 
class="ph codeph">DEFAULT</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> 
statement. If the default value is
+        <code class="ph codeph">NULL</code>, that is not indicated in this 
column. It is implied by
+        <code class="ph codeph">nullable</code> being true and no other 
default value specified.
+      </li>
+      <li class="li">
+        The encoding used for values in the column. Specified by the <code 
class="ph codeph">ENCODING</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+      <li class="li">
+        The compression used for values in the column. Specified by the <code 
class="ph codeph">COMPRESSION</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+      <li class="li">
+        The block size (in bytes) used for the underlying Kudu storage layer 
for the column.
+        Specified by the <code class="ph codeph">BLOCK_SIZE</code> attribute 
on the <code class="ph codeph">CREATE TABLE</code>
+        statement.
+      </li>
+    </ul>
+
+    <p class="p">
+      The following example shows <code class="ph codeph">DESCRIBE</code> 
output for a simple Kudu table, with 
+      a single-column primary key and all column attributes left with their 
default values:
+    </p>
+
+<pre class="pre codeblock"><code>
+describe million_rows;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding  
    | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| id   | string |         | true        | false    |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| s    | string |         | false       | false    |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+      The following example shows <code class="ph codeph">DESCRIBE</code> 
output for a Kudu table with a
+      two-column primary key, and Kudu-specific attributes applied to some 
columns:
+    </p>
+
+<pre class="pre codeblock"><code>
+create table kudu_describe_example
+(
+  c1 int, c2 int,
+  c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
+  c7 bigint not null, c8 bigint null default null, c9 bigint default -1 
encoding bit_shuffle,
+  primary key(c1,c2)
+)
+partition by hash (c1, c2) partitions 10 stored as kudu;
+
+describe kudu_describe_example;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding  
    | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| c1   | int    |         | true        | false    |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c2   | int    |         | true        | false    |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c3   | string |         | false       | true     |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c4   | string |         | false       | false    |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c5   | string |         | false       | true     | n/a           | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c6   | string |         | false       | true     |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c7   | bigint |         | false       | false    |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c8   | bigint |         | false       | true     |               | 
AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c9   | bigint |         | false       | true     | -1            | 
BIT_SHUFFLE   | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala 
Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE 
TABLE Statement</a>,
+      <a class="xref" href="impala_show.html#show_tables">SHOW TABLES 
Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW 
CREATE TABLE Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_langref_sql.html">Impala SQL 
Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_development.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_development.html 
b/docs/build/html/topics/impala_development.html
new file mode 100644
index 0000000..f8e0ae5
--- /dev/null
+++ b/docs/build/html/topics/impala_development.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_concepts.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="intro_dev"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Developing Impala 
Applications</title></head><body id="intro_dev"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Developing Impala 
Applications</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The core development language with Impala is SQL. You can also use Java 
or other languages to interact with
+      Impala through the standard JDBC and ODBC interfaces used by many 
business intelligence tools. For
+      specialized kinds of analysis, you can supplement the SQL built-in 
functions by writing
+      <a class="xref" href="impala_udf.html#udfs">user-defined functions 
(UDFs)</a> in C++ or Java.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_concepts.html">Impala Concepts and 
Architecture</a></div></div></nav><article class="topic concept nested1" 
aria-labelledby="ariaid-title2" id="intro_dev__intro_sql">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of the Impala 
SQL Dialect</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala SQL dialect is highly compatible with the SQL syntax used 
in the Apache Hive component (HiveQL). As
+        such, it is familiar to users who are already familiar with running 
SQL queries on the Hadoop
+        infrastructure. Currently, Impala SQL supports a subset of HiveQL 
statements, data types, and built-in
+        functions. Impala also includes additional built-in functions for 
common industry features, to simplify
+        porting SQL from non-Hadoop systems.
+      </p>
+
+      <p class="p">
+        For users coming to Impala from traditional database or data 
warehousing backgrounds, the following aspects of the SQL dialect
+        might seem familiar:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_select.html#select">SELECT 
statement</a> includes familiar clauses such as <code class="ph 
codeph">WHERE</code>,
+            <code class="ph codeph">GROUP BY</code>, <code class="ph 
codeph">ORDER BY</code>, and <code class="ph codeph">WITH</code>.
+            You will find familiar notions such as
+            <a class="xref" href="impala_joins.html#joins">joins</a>, <a 
class="xref" href="impala_functions.html#builtins">built-in
+            functions</a> for processing strings, numbers, and dates,
+            <a class="xref" 
href="impala_aggregate_functions.html#aggregate_functions">aggregate 
functions</a>,
+            <a class="xref" 
href="impala_subqueries.html#subqueries">subqueries</a>, and
+            <a class="xref" 
href="impala_operators.html#comparison_operators">comparison operators</a>
+            such as <code class="ph codeph">IN()</code> and <code class="ph 
codeph">BETWEEN</code>.
+            The <code class="ph codeph">SELECT</code> statement is the place 
where SQL standards compliance is most important.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          From the data warehousing world, you will recognize the notion of
+          <a class="xref" 
href="impala_partitioning.html#partitioning">partitioned tables</a>.
+          One or more columns serve as partition keys, and the data is 
physically arranged so that
+          queries that refer to the partition key columns in the <code 
class="ph codeph">WHERE</code> clause
+          can skip partitions that do not match the filter conditions. For 
example, if you have 10
+          years worth of data and use a clause such as <code class="ph 
codeph">WHERE year = 2015</code>,
+          <code class="ph codeph">WHERE year &gt; 2010</code>, or <code 
class="ph codeph">WHERE year IN (2014, 2015)</code>,
+          Impala skips all the data for non-matching years, greatly reducing 
the amount of I/O
+          for the query.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          In Impala 1.2 and higher, <a class="xref" 
href="impala_udf.html#udfs">UDFs</a> let you perform custom comparisons
+          and transformation logic during <code class="ph 
codeph">SELECT</code> and <code class="ph codeph">INSERT...SELECT</code> 
statements.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        For users coming to Impala from traditional database or data 
warehousing backgrounds, the following aspects of the SQL dialect
+        might require some learning and practice for you to become proficient 
in the Hadoop environment:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+          Impala SQL is focused on queries and includes relatively little DML. 
There is no <code class="ph codeph">UPDATE</code>
+          or <code class="ph codeph">DELETE</code> statement. Stale data is 
typically discarded (by <code class="ph codeph">DROP TABLE</code>
+          or <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code> 
statements) or replaced (by <code class="ph codeph">INSERT
+          OVERWRITE</code> statements).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          All data creation is done by <code class="ph codeph">INSERT</code> 
statements, which typically insert data in bulk by
+          querying from other tables. There are two variations, <code 
class="ph codeph">INSERT INTO</code> which appends to the
+          existing data, and <code class="ph codeph">INSERT OVERWRITE</code> 
which replaces the entire contents of a table or
+          partition (similar to <code class="ph codeph">TRUNCATE TABLE</code> 
followed by a new <code class="ph codeph">INSERT</code>).
+          Although there is an <code class="ph codeph">INSERT ... 
VALUES</code> syntax to create a small number of values in
+          a single statement, it is far more efficient to use the <code 
class="ph codeph">INSERT ... SELECT</code> to copy
+          and transform large amounts of data from one table to another in a 
single operation.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          You often construct Impala table definitions and data files in some 
other environment, and then attach
+          Impala so that it can run real-time queries. The same data files and 
table metadata are shared with other
+          components of the Hadoop ecosystem. In particular, Impala can access 
tables created by Hive or data
+          inserted by Hive, and Hive can access tables and data produced by 
Impala. Many other Hadoop components
+          can write files in formats such as Parquet and Avro, that can then 
be queried by Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          Because Hadoop and Impala are focused on data warehouse-style 
operations on large data sets, Impala SQL
+          includes some idioms that you might find in the import utilities for 
traditional database systems. For
+          example, you can create a table that reads comma-separated or 
tab-separated text files, specifying the
+          separator in the <code class="ph codeph">CREATE TABLE</code> 
statement. You can create <strong class="ph b">external tables</strong> that 
read
+          existing data files but do not move or transform them.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          Because Impala reads large quantities of data that might not be 
perfectly tidy and predictable, it does
+          not require length constraints on string data types. For example, 
you can define a database column as
+          <code class="ph codeph">STRING</code> with unlimited length, rather 
than <code class="ph codeph">CHAR(1)</code> or
+          <code class="ph codeph">VARCHAR(64)</code>. <span 
class="ph">(Although in Impala 2.0 and later, you can also use
+          length-constrained <code class="ph codeph">CHAR</code> and <code 
class="ph codeph">VARCHAR</code> types.)</span>
+          </p>
+        </li>
+
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" 
href="impala_langref.html#langref">Impala SQL Language Reference</a>, especially
+        <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL 
Statements</a> and <a class="xref" href="impala_functions.html#builtins">Impala 
Built-In Functions</a>
+      </p>
+    </div>
+  </article>
+
+
+
+  
+
+  
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="intro_dev__intro_apis">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala 
Programming Interfaces</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can connect and submit requests to the Impala daemons through:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph"><a class="xref" 
href="impala_impala_shell.html#impala_shell">impala-shell</a></code> interactive
+          command interpreter.
+        </li>
+
+        <li class="li">
+          The <a class="xref" href="http://gethue.com/"; 
target="_blank">Hue</a> web-based user interface.
+        </li>
+
+        <li class="li">
+          <a class="xref" href="impala_jdbc.html#impala_jdbc">JDBC</a>.
+        </li>
+
+        <li class="li">
+          <a class="xref" href="impala_odbc.html#impala_odbc">ODBC</a>.
+        </li>
+      </ul>
+
+      <p class="p">
+        With these options, you can use Impala in heterogeneous environments, 
with JDBC or ODBC applications
+        running on non-Linux platforms. You can also use Impala on combination 
with various Business Intelligence
+        tools that use the JDBC and ODBC interfaces.
+      </p>
+
+      <p class="p">
+        Each <code class="ph codeph">impalad</code> daemon process, running on 
separate nodes in a cluster, listens to
+        <a class="xref" href="impala_ports.html#ports">several ports</a> for 
incoming requests. Requests from
+        <code class="ph codeph">impala-shell</code> and Hue are routed to the 
<code class="ph codeph">impalad</code> daemons through the same
+        port. The <code class="ph codeph">impalad</code> daemons listen on 
separate ports for JDBC and ODBC requests.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_codegen.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_codegen.html 
b/docs/build/html/topics/impala_disable_codegen.html
new file mode 100644
index 0000000..f8766b7
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_codegen.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="disable_codegen"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>DISABLE_CODEGEN Query Option</title></head><body 
id="disable_codegen"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_CODEGEN Query 
Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      This is a debug option, intended for diagnosing and working around 
issues that cause crashes. If a query
+      fails with an <span class="q">"illegal instruction"</span> or other 
hardware-specific message, try setting
+      <code class="ph codeph">DISABLE_CODEGEN=true</code> and running the 
query again. If the query succeeds only when the
+      <code class="ph codeph">DISABLE_CODEGEN</code> option is turned on, 
submit the problem to <span class="keyword">the appropriate support 
channel</span> and include that
+      detail in the problem report. Do not otherwise run with this setting 
turned on, because it results in lower
+      overall performance.
+    </p>
+
+    <p class="p">
+      Because the code generation phase adds a small amount of overhead for 
each query, you might turn on the
+      <code class="ph codeph">DISABLE_CODEGEN</code> option to achieve maximum 
throughput when running many short-lived queries
+      against small tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 
and 0, or <code class="ph codeph">true</code> and <code class="ph 
codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph 
codeph">false</code> (shown as 0 in output of <code class="ph 
codeph">SET</code> statement)
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_row_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_row_runtime_filtering.html 
b/docs/build/html/topics/impala_disable_row_runtime_filtering.html
new file mode 100644
index 0000000..11ccb80
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_row_runtime_filtering.html
@@ -0,0 +1,72 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="disable_row_runtime_filtering"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>DISABLE_ROW_RUNTIME_FILTERING Query Option 
(Impala 2.5 or higher only)</title></head><body 
id="disable_row_runtime_filtering"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" 
id="ariaid-title1">DISABLE_ROW_RUNTIME_FILTERING Query Option (<span 
class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code> query 
option
+      reduces the scope of the runtime filtering feature. Queries still 
dynamically prune
+      partitions, but do not apply the filtering logic to individual rows 
within partitions.
+    </p>
+
+    <p class="p">
+      Only applies to queries against Parquet tables. For other file formats, 
Impala
+      only prunes at the level of partitions, not individual rows.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 
and 0, or <code class="ph codeph">true</code> and <code class="ph 
codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph 
codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Impala automatically evaluates whether the per-row filters are being
+      effective at reducing the amount of intermediate data. Therefore,
+      this option is typically only needed for the rare case where Impala
+      cannot accurately determine how effective the per-row filtering is
+      for a query.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to 
resource-intensive
+        and long-running queries, only adjust this query option when tuning 
long-running queries
+        involving some combination of large partitioned tables and joins 
involving large tables.
+      </p>
+
+    <p class="p">
+      Because this setting only improves query performance in very specific
+      circumstances, depending on the query characteristics and data 
distribution,
+      only use it when you determine through benchmarking that it improves
+      performance of specific expensive queries.
+      Consider setting this query option immediately before the expensive 
query and
+      unsetting it immediately afterward.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering 
for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" 
href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE 
Query Option (Impala 2.5 or higher only)</a>
+      
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_streaming_preaggregations.html
----------------------------------------------------------------------
diff --git 
a/docs/build/html/topics/impala_disable_streaming_preaggregations.html 
b/docs/build/html/topics/impala_disable_streaming_preaggregations.html
new file mode 100644
index 0000000..98ea640
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_streaming_preaggregations.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="disable_streaming_preaggregations"><link rel="stylesheet" 
type="text/css" 
href="../commonltr.css"><title>DISABLE_STREAMING_PREAGGREGATIONS Query Option 
(Impala 2.5 or higher only)</title></head><body 
id="disable_streaming_preaggregations"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" 
id="ariaid-title1">DISABLE_STREAMING_PREAGGREGATIONS Query Option (<span 
class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Turns off the <span class="q">"streaming preaggregation"</span> 
optimization that is available in <span class="keyword">Impala 2.5</span>
+      and higher. This optimization reduces unnecessary work performed by 
queries that perform aggregation
+      operations on columns with few or no duplicate values, for example <code 
class="ph codeph">DISTINCT <var class="keyword varname">id_column</var></code>
+      or <code class="ph codeph">GROUP BY <var class="keyword 
varname">unique_column</var></code>. If the optimization causes regressions in
+      existing queries that use aggregation functions, you can turn it off as 
needed by setting this query option.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 
and 0, or <code class="ph codeph">true</code> and <code class="ph 
codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph 
codeph">false</code> (shown as 0 in output of <code class="ph 
codeph">SET</code> statement)
+      </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables 
the option, and the value
+        <code class="ph codeph">true</code> is not recognized. This limitation 
is
+        tracked by the issue
+        <a class="xref" 
href="https://issues.apache.org/jira/browse/IMPALA-3334"; 
target="_blank">IMPALA-3334</a>,
+        which shows the releases where the problem is fixed.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      Typically, queries that would require enabling this option involve very 
large numbers of
+      aggregated values, such as a billion or more distinct keys being 
processed on each
+      worker node.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.5.0</span>
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_unsafe_spills.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_unsafe_spills.html 
b/docs/build/html/topics/impala_disable_unsafe_spills.html
new file mode 100644
index 0000000..01bc8fd
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_unsafe_spills.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="disable_unsafe_spills"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 
or higher only)</title></head><body id="disable_unsafe_spills"><main 
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_UNSAFE_SPILLS Query 
Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Enable this option if you prefer to have queries fail when they exceed 
the Impala memory limit, rather than
+      write temporary data to disk.
+    </p>
+
+    <p class="p">
+      Queries that <span class="q">"spill"</span> to disk typically complete 
successfully, when in earlier Impala releases they would have failed.
+      However, queries with exorbitant memory requirements due to missing 
statistics or inefficient join clauses could
+      become so slow as a result that you would rather have them cancelled 
automatically and reduce the memory
+      usage through standard Impala tuning techniques.
+    </p>
+
+    <p class="p">
+      This option prevents only <span class="q">"unsafe"</span> spill 
operations, meaning that one or more tables are missing
+      statistics or the query does not include a hint to set the most 
efficient mechanism for a join or
+      <code class="ph codeph">INSERT ... SELECT</code> into a partitioned 
table. These are the tables most likely to result in
+      suboptimal execution plans that could cause unnecessary spilling. 
Therefore, leaving this option enabled is a
+      good way to find tables on which to run the <code class="ph 
codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL 
Operations that Spill to Disk</a> for information about the <span 
class="q">"spill to disk"</span>
+      feature for queries processing large result sets with joins, <code 
class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP
+      BY</code>, <code class="ph codeph">DISTINCT</code>, aggregation 
functions, or analytic functions.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 
and 0, or <code class="ph codeph">true</code> and <code class="ph 
codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph 
codeph">false</code> (shown as 0 in output of <code class="ph 
codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.0.0</span>
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disk_space.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disk_space.html 
b/docs/build/html/topics/impala_disk_space.html
new file mode 100644
index 0000000..0b102e5
--- /dev/null
+++ b/docs/build/html/topics/impala_disk_space.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_admin.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="disk_space"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Managing Disk Space for Impala 
Data</title></head><body id="disk_space"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Managing Disk Space for 
Impala Data</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Although Impala typically works with many large files in an HDFS storage 
system with plenty of capacity,
+      there are times when you might perform some file cleanup to reclaim 
space, or advise developers on techniques
+      to minimize space consumption and file duplication.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Use compact binary file formats where practical. Numeric and 
time-based data in particular can be stored
+          in more compact form in binary data files. Depending on the file 
format, various compression and encoding
+          features can reduce file size even further. You can specify the 
<code class="ph codeph">STORED AS</code> clause as part
+          of the <code class="ph codeph">CREATE TABLE</code> statement, or 
<code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">SET
+          FILEFORMAT</code> clause for an existing table or partition within a 
partitioned table. See
+          <a class="xref" href="impala_file_formats.html#file_formats">How 
Impala Works with Hadoop File Formats</a> for details about file formats, 
especially
+          <a class="xref" href="impala_parquet.html#parquet">Using the Parquet 
File Format with Impala Tables</a>. See <a class="xref" 
href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+          <a class="xref" href="impala_alter_table.html#alter_table">ALTER 
TABLE Statement</a> for syntax details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You manage underlying data files differently depending on whether 
the corresponding Impala table is
+          defined as an <a class="xref" 
href="impala_tables.html#internal_tables">internal</a> or
+          <a class="xref" 
href="impala_tables.html#external_tables">external</a> table:
+        </p>
+        <ul class="ul">
+          <li class="li">
+            Use the <code class="ph codeph">DESCRIBE FORMATTED</code> 
statement to check if a particular table is internal
+            (managed by Impala) or external, and to see the physical location 
of the data files in HDFS. See
+            <a class="xref" href="impala_describe.html#describe">DESCRIBE 
Statement</a> for details.
+          </li>
+
+          <li class="li">
+            For Impala-managed (<span class="q">"internal"</span>) tables, use 
<code class="ph codeph">DROP TABLE</code> statements to remove
+            data files. See <a class="xref" 
href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details.
+          </li>
+
+          <li class="li">
+            For tables not managed by Impala (<span 
class="q">"external"</span> tables), use appropriate HDFS-related commands such
+            as <code class="ph codeph">hadoop fs</code>, <code class="ph 
codeph">hdfs dfs</code>, or <code class="ph codeph">distcp</code>, to create, 
move,
+            copy, or delete files within HDFS directories that are accessible 
by the <code class="ph codeph">impala</code> user.
+            Issue a <code class="ph codeph">REFRESH <var class="keyword 
varname">table_name</var></code> statement after adding or removing any
+            files from the data directory of an external table. See <a 
class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for
+            details.
+          </li>
+
+          <li class="li">
+            Use external tables to reference HDFS data files in their original 
location. With this technique, you
+            avoid copying the files, and you can map more than one Impala 
table to the same set of data files. When
+            you drop the Impala table, the data files are left undisturbed. See
+            <a class="xref" href="impala_tables.html#external_tables">External 
Tables</a> for details.
+          </li>
+
+          <li class="li">
+            Use the <code class="ph codeph">LOAD DATA</code> statement to move 
HDFS files into the data directory for an Impala
+            table from inside Impala, without the need to specify the HDFS 
path of the destination directory. This
+            technique works for both internal and external tables. See
+            <a class="xref" href="impala_load_data.html#load_data">LOAD DATA 
Statement</a> for details.
+          </li>
+        </ul>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Make sure that the HDFS trashcan is configured correctly. When you 
remove files from HDFS, the space
+          might not be reclaimed for use by other files until sometime later, 
when the trashcan is emptied. See
+          <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE 
Statement</a> for details. See
+          <a class="xref" href="impala_prereqs.html#prereqs_account">User 
Account Requirements</a> for permissions needed for the HDFS trashcan to operate
+          correctly.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Drop all tables in a database before dropping the database itself. 
See
+          <a class="xref" href="impala_drop_database.html#drop_database">DROP 
DATABASE Statement</a> for details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Clean up temporary files after failed <code class="ph 
codeph">INSERT</code> statements. If an <code class="ph codeph">INSERT</code>
+          statement encounters an error, and you see a directory named <span 
class="ph filepath">.impala_insert_staging</span>
+          or <span class="ph filepath">_impala_insert_staging</span> left 
behind in the data directory for the table, it might
+          contain temporary data files taking up space in HDFS. You might be 
able to salvage these data files, for
+          example if they are complete but could not be moved into place due 
to a permission error. Or, you might
+          delete those files through commands such as <code class="ph 
codeph">hadoop fs</code> or <code class="ph codeph">hdfs dfs</code>, to
+          reclaim space before re-trying the <code class="ph 
codeph">INSERT</code>. Issue <code class="ph codeph">DESCRIBE FORMATTED
+          <var class="keyword varname">table_name</var></code> to see the HDFS 
path where you can check for temporary files.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        By default, intermediate files used during large sort, join, 
aggregation, or analytic function operations
+        are stored in the directory <span class="ph 
filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform 
operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You 
can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with 
the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword 
varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of 
directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different 
directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 
2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it 
cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on 
the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala 
encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and 
the query fails.
+      </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If you use the Amazon Simple Storage Service (S3) as a place to 
offload
+          data to reduce the volume of local storage, Impala 2.2.0 and higher
+          can query the data directly from S3.
+          See <a class="xref" href="impala_s3.html#s3">Using Impala with the 
Amazon S3 Filesystem</a> for details.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_admin.html">Impala 
Administration</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[36/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Reply via email to