[29/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

jbapple Wed, 12 Apr 2017 11:25:50 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_insert.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_insert.html 
b/docs/build/html/topics/impala_insert.html
new file mode 100644
index 0000000..557ab70
--- /dev/null
+++ b/docs/build/html/topics/impala_insert.html
@@ -0,0 +1,798 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_langref_sql.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" 
content="Impala"><meta name="version" content="Impala 2.8.x"><meta 
name="version" content="Impala 2.8.x"><meta name="version" content="Impala 
2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="insert"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>INSERT Statement</title></head><body 
id="insert"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INSERT Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports inserting into tables and partitions that you create 
with the Impala <code class="ph codeph">CREATE
+      TABLE</code> statement, or pre-defined tables and partitions created 
through Hive.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>[<var class="keyword 
varname">with_clause</var>]
+INSERT { INTO | OVERWRITE } [TABLE] <var class="keyword 
varname">table_name</var>
+  [(<var class="keyword varname">column_list</var>)]
+  [ PARTITION (<var class="keyword varname">partition_clause</var>)]
+{
+    [<var class="keyword varname">hint_clause</var>] <var class="keyword 
varname">select_statement</var>
+  | VALUES (<var class="keyword varname">value</var> [, <var class="keyword 
varname">value</var> ...]) [, (<var class="keyword varname">value</var> [, <var 
class="keyword varname">value</var> ...]) ...]
+}
+
+partition_clause ::= <var class="keyword varname">col_name</var> [= <var 
class="keyword varname">constant</var>] [, <var class="keyword 
varname">col_name</var> [= <var class="keyword varname">constant</var>] ...]
+
+hint_clause ::= [SHUFFLE] | [NOSHUFFLE]    (Note: the square brackets are part 
of the syntax.)
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Appending or replacing (INTO and OVERWRITE 
clauses):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT INTO</code> syntax appends data to a 
table. The existing data files are left as-is, and
+      the inserted data is put into one or more new data files.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT OVERWRITE</code> syntax replaces the 
data in a table.
+
+
+      Currently, the overwritten data files are deleted immediately; they do 
not go through the HDFS trash
+      mechanism.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT</code> statement currently does not 
support writing data files
+      containing complex types (<code class="ph codeph">ARRAY</code>, <code 
class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+      To prepare Parquet data for such tables, you generate the data files 
outside Impala and then
+      use <code class="ph codeph">LOAD DATA</code> or <code class="ph 
codeph">CREATE EXTERNAL TABLE</code> to associate those
+      data files with the table. Currently, such tables must use the Parquet 
file format.
+      See <a class="xref" 
href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or 
higher only)</a> for details about working with complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        Currently, the <code class="ph codeph">INSERT OVERWRITE</code> syntax 
cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+      Kudu tables require a unique primary key for each row. If an <code 
class="ph codeph">INSERT</code>
+      statement attempts to insert a row with the same values for the primary 
key columns
+      as an existing row, that row is discarded and the insert operation 
continues.
+      When rows are discarded due to duplicate primary keys, the statement 
finishes
+      with a warning, not an error. (This is a change from early releases of 
Kudu
+      where the default was to return in error in such cases, and the syntax
+      <code class="ph codeph">INSERT IGNORE</code> was required to make the 
statement succeed.
+      The <code class="ph codeph">IGNORE</code> clause is no longer part of 
the <code class="ph codeph">INSERT</code>
+      syntax.)
+    </p>
+
+    <p class="p">
+      For situations where you prefer to replace rows with duplicate primary 
key values,
+      rather than discarding the new data, you can use the <code class="ph 
codeph">UPSERT</code>
+      statement instead of <code class="ph codeph">INSERT</code>. <code 
class="ph codeph">UPSERT</code> inserts
+      rows that are entirely new, and for rows that match an existing primary 
key in the
+      table, the non-primary-key columns are updated to reflect the values in 
the
+      <span class="q">"upserted"</span> data.
+    </p>
+
+    <p class="p">
+      If you really want to store new rows, not replace existing ones, but 
cannot do so
+      because of the primary key uniqueness constraint, consider recreating 
the table
+      with additional columns included in the primary key.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to 
Query Kudu Tables</a> for more details about using Impala with Kudu.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Impala currently supports:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Copy data from another table using <code class="ph 
codeph">SELECT</code> query. In Impala 1.2.1 and higher, you can
+        combine <code class="ph codeph">CREATE TABLE</code> and <code 
class="ph codeph">INSERT</code> operations into a single step with the
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax, which 
bypasses the actual <code class="ph codeph">INSERT</code> keyword.
+      </li>
+
+      <li class="li">
+        An optional <a class="xref" href="impala_with.html#with"><code 
class="ph codeph">WITH</code> clause</a> before the
+        <code class="ph codeph">INSERT</code> keyword, to define a subquery 
referenced in the <code class="ph codeph">SELECT</code> portion.
+      </li>
+
+      <li class="li">
+        Create one or more new rows using constant expressions through <code 
class="ph codeph">VALUES</code> clause. (The
+        <code class="ph codeph">VALUES</code> clause was added in Impala 
1.0.1.)
+      </li>
+
+      <li class="li">
+        <p class="p">
+          By default, the first column of each newly inserted row goes into 
the first column of the table, the
+          second column into the second column, and so on.
+        </p>
+        <p class="p">
+          You can also specify the columns to be inserted, an arbitrarily 
ordered subset of the columns in the
+          destination table, by specifying a column list immediately after the 
name of the destination table. This
+          feature lets you adjust the inserted columns to match the layout of 
a <code class="ph codeph">SELECT</code> statement,
+          rather than the other way around. (This feature was added in Impala 
1.1.)
+        </p>
+        <p class="p">
+          The number of columns mentioned in the column list (known as the 
<span class="q">"column permutation"</span>) must match
+          the number of columns in the <code class="ph codeph">SELECT</code> 
list or the <code class="ph codeph">VALUES</code> tuples. The
+          order of columns in the column permutation can be different than in 
the underlying table, and the columns
+          of each input row are reordered to match. If the number of columns 
in the column permutation is less than
+          in the destination table, all unmentioned columns are set to <code 
class="ph codeph">NULL</code>.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For a partitioned table, the optional <code class="ph 
codeph">PARTITION</code> clause identifies which partition or
+          partitions the new values go into. If a partition key column is 
given a constant value such as
+          <code class="ph codeph">PARTITION (year=2012)</code> or <code 
class="ph codeph">PARTITION (year=2012, month=2)</code>, all the
+          inserted rows use those same values for those partition key columns 
and you omit any corresponding
+          columns in the source table from the <code class="ph 
codeph">SELECT</code> list. This form is known as <span class="q">"static
+          partitioning"</span>.
+        </p>
+        <p class="p">
+          If a partition key column is mentioned but not assigned a value, 
such as in <code class="ph codeph">PARTITION (year,
+          region)</code> (both columns unassigned) or <code class="ph 
codeph">PARTITION(year, region='CA')</code>
+          (<code class="ph codeph">year</code> column unassigned), the 
unassigned columns are filled in with the final columns of
+          the <code class="ph codeph">SELECT</code> list. In this case, the 
number of columns in the <code class="ph codeph">SELECT</code> list
+          must equal the number of columns in the column permutation plus the 
number of partition key columns not
+          assigned a constant value. This form is known as <span 
class="q">"dynamic partitioning"</span>.
+        </p>
+        <p class="p">
+          See <a class="xref" 
href="impala_partitioning.html#partition_static_dynamic">Static and Dynamic 
Partitioning Clauses</a> for examples and performance
+          characteristics of static and dynamic partitioned inserts.
+        </p>
+      </li>
+
+      <li class="li">
+        An optional hint clause immediately before the <code class="ph 
codeph">SELECT</code> keyword, to fine-tune the behavior
+        when doing an <code class="ph codeph">INSERT ... SELECT</code> 
operation into partitioned Parquet tables. The hint
+        keywords are <code class="ph codeph">[SHUFFLE]</code> and <code 
class="ph codeph">[NOSHUFFLE]</code>, including the square brackets.
+        Inserting into partitioned Parquet tables can be a resource-intensive 
operation because it potentially
+        involves many files being written to HDFS simultaneously, and separate
+        <span class="ph">large</span> memory buffers being allocated to buffer 
the data for each
+        partition. For usage details, see <a class="xref" 
href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a>.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      <ul class="ul">
+        <li class="li">
+          Insert commands that partition or add files result in changes to 
Hive metadata. Because Impala uses Hive
+          metadata, such changes may necessitate a metadata refresh. For more 
information, see the
+          <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> 
function.
+        </li>
+
+        <li class="li">
+          Currently, Impala can only insert data into tables that use the text 
and Parquet formats. For other file
+          formats, insert the data using Hive and use Impala to query it.
+        </li>
+
+        <li class="li">
+          As an alternative to the <code class="ph codeph">INSERT</code> 
statement, if you have existing data files elsewhere in
+          HDFS, the <code class="ph codeph">LOAD DATA</code> statement can 
move those files into a table. This statement works
+          with tables of any file format.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML (but still affected 
by
+        <a class="xref" 
href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query 
option)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      When you insert the results of an expression, particularly of a built-in 
function call, into a small numeric
+      column such as <code class="ph codeph">INT</code>, <code class="ph 
codeph">SMALLINT</code>, <code class="ph codeph">TINYINT</code>, or
+      <code class="ph codeph">FLOAT</code>, you might need to use a <code 
class="ph codeph">CAST()</code> expression to coerce values into the
+      appropriate type. Impala does not automatically convert from a larger 
type to a smaller one. For example, to
+      insert cosine values into a <code class="ph codeph">FLOAT</code> column, 
write <code class="ph codeph">CAST(COS(angle) AS FLOAT)</code>
+      in the <code class="ph codeph">INSERT</code> statement to make the 
conversion explicit.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Impala can read certain file formats that it cannot write,
+      the <code class="ph codeph">INSERT</code> statement does not work for 
all kinds of
+      Impala tables. See <a class="xref" 
href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File 
Formats</a>
+      for details about what file formats are supported by the
+      <code class="ph codeph">INSERT</code> statement.
+    </p>
+
+    <p class="p">
+        Any <code class="ph codeph">INSERT</code> statement for a Parquet 
table requires enough free space in the HDFS filesystem
+        to write one block. Because Parquet data files use a block size of 1 
GB by default, an
+        <code class="ph codeph">INSERT</code> might fail (even for a very 
small amount of data) if your HDFS is running low on
+        space.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span 
class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph 
codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has 
been received by all the Impala
+        nodes. See <a class="xref" 
href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query 
Option</a> for details.
+      </p>
+
+    <div class="note important note_important"><span class="note__title 
importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical 
queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. 
Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph 
codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive 
and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This 
technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example sets up new tables with the same definition as the 
<code class="ph codeph">TAB1</code> table from the
+      <a class="xref" href="impala_tutorial.html#tutorial">Tutorial</a> 
section, using different file
+      formats, and demonstrates inserting data into the tables created with 
the <code class="ph codeph">STORED AS TEXTFILE</code>
+      and <code class="ph codeph">STORED AS PARQUET</code> clauses:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE DATABASE IF NOT EXISTS file_formats;
+USE file_formats;
+
+DROP TABLE IF EXISTS text_table;
+CREATE TABLE text_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS TEXTFILE;
+
+DROP TABLE IF EXISTS parquet_table;
+CREATE TABLE parquet_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS PARQUET;</code></pre>
+
+    <p class="p">
+      With the <code class="ph codeph">INSERT INTO TABLE</code> syntax, each 
new set of inserted rows is appended to any existing
+      data in the table. This is how you would record small amounts of data 
that arrive continuously, or ingest new
+      batches of data alongside the existing data. For example, after running 
2 <code class="ph codeph">INSERT INTO TABLE</code>
+      statements with 5 rows each, the table contains 10 rows total:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into table 
text_table select * from default.tab1;
+Inserted 5 rows in 0.41s
+
+[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.46s
+
+[localhost:21000] &gt; select count(*) from text_table;
++----------+
+| count(*) |
++----------+
+| 10       |
++----------+
+Returned 1 row(s) in 0.26s</code></pre>
+
+    <p class="p">
+      With the <code class="ph codeph">INSERT OVERWRITE TABLE</code> syntax, 
each new set of inserted rows replaces any existing
+      data in the table. This is how you load data to query in a data 
warehousing scenario where you analyze just
+      the data for a particular day, quarter, and so on, discarding the 
previous data each time. You might keep the
+      entire set of data in one raw table, and transfer and transform certain 
rows into a more compact and
+      efficient form to perform intensive analysis on that subset.
+    </p>
+
+    <p class="p">
+      For example, here we insert 5 rows into a table using the <code 
class="ph codeph">INSERT INTO</code> clause, then replace
+      the data by inserting 3 rows with the <code class="ph codeph">INSERT 
OVERWRITE</code> clause. Afterward, the table only
+      contains the 3 rows from the final <code class="ph codeph">INSERT</code> 
statement.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into table 
parquet_table select * from default.tab1;
+Inserted 5 rows in 0.35s
+
+[localhost:21000] &gt; insert overwrite table parquet_table select * from 
default.tab1 limit 3;
+Inserted 3 rows in 0.43s
+[localhost:21000] &gt; select count(*) from parquet_table;
++----------+
+| count(*) |
++----------+
+| 3        |
++----------+
+Returned 1 row(s) in 0.43s</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph"><a class="xref" 
href="impala_insert.html#values">VALUES</a></code> clause lets you insert one 
or more
+      rows by specifying constant values for all the columns. The number, 
types, and order of the expressions must
+      match the table definition.
+    </p>
+
+    <div class="note note note_note" id="insert__insert_values_warning"><span 
class="note__title notetitle">Note:</span> 
+      The <code class="ph codeph">INSERT ... VALUES</code> technique is not 
suitable for loading large quantities of data into
+      HDFS-based tables, because the insert operations cannot be parallelized, 
and each one produces a separate
+      data file. Use it for setting up small dimension tables or tiny amounts 
of data for experimenting with SQL
+      syntax, or with HBase tables. Do not use it for large ETL jobs or 
benchmark tests for load operations. Do not
+      run scripts with thousands of <code class="ph codeph">INSERT ... 
VALUES</code> statements that insert a single row each
+      time. If you do run <code class="ph codeph">INSERT ... VALUES</code> 
operations to load data into a staging table as one
+      stage in an ETL pipeline, include multiple row values if possible within 
each <code class="ph codeph">VALUES</code> clause,
+      and use a separate database to make cleanup easier if the operation does 
produce many tiny files.
+    </div>
+
+    <p class="p">
+      The following example shows how to insert one row or multiple rows, with 
expressions of different types,
+      using literal values, expressions, and function return values:
+    </p>
+
+<pre class="pre codeblock"><code>create table val_test_1 (c1 int, c2 float, c3 
string, c4 boolean, c5 timestamp);
+insert into val_test_1 values (100, 99.9/10, 'abc', true, now());
+create table val_test_2 (id int, token string);
+insert overwrite val_test_2 values (1, 'a'), (2, 'b'), 
(-1,'xyzzy');</code></pre>
+
+    <p class="p">
+      These examples show the type of <span class="q">"not implemented"</span> 
error that you see when attempting to insert data into
+      a table with a file format that Impala currently does not write to:
+    </p>
+
+<pre class="pre codeblock"><code>DROP TABLE IF EXISTS sequence_table;
+CREATE TABLE sequence_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS SEQUENCEFILE;
+
+DROP TABLE IF EXISTS rc_table;
+CREATE TABLE rc_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS RCFILE;
+
+[localhost:21000] &gt; insert into table rc_table select * from default.tab1;
+Remote error
+Backend 0:RC_FILE not implemented.
+
+[localhost:21000] &gt; insert into table sequence_table select * from 
default.tab1;
+Remote error
+Backend 0:SEQUENCE_FILE not implemented. </code></pre>
+
+    <p class="p">
+      Inserting data into partitioned tables requires slightly different 
syntax that divides the partitioning
+      columns from the others:
+    </p>
+
+<pre class="pre codeblock"><code>create table t1 (i int) <strong class="ph 
b">partitioned by (x int, y string)</strong>;
+-- Select an INT column from another table.
+-- All inserted rows will have the same x and y values, as specified in the 
INSERT statement.
+-- This technique of specifying all the partition key values is known as 
static partitioning.
+insert into t1 <strong class="ph b">partition(x=10, y='a')</strong> select c1 
from some_other_table;
+-- Select two INT columns from another table.
+-- All inserted rows will have the same y value, as specified in the INSERT 
statement.
+-- Values from c2 go into t1.x.
+-- Any partitioning columns whose value is not specified are filled in
+-- from the columns specified last in the SELECT list.
+-- This technique of omitting some partition key values is known as dynamic 
partitioning.
+insert into t1 <strong class="ph b">partition(x, y='b')</strong> select c1, c2 
from some_other_table;
+-- Select an INT and a STRING column from another table.
+-- All inserted rows will have the same x value, as specified in the INSERT 
statement.
+-- Values from c3 go into t1.y.
+insert into t1 <strong class="ph b">partition(x=20, y)</strong> select c1, c3  
from some_other_table;</code></pre>
+
+    <p class="p">
+      The following examples show how you can copy the data in all the columns 
from one table to another, copy the
+      data from only some columns, or specify the columns in the select list 
in a different order than they
+      actually appear in the table:
+    </p>
+
+<pre class="pre codeblock"><code>-- Start with 2 identical tables.
+create table t1 (c1 int, c2 int);
+create table t2 like t1;
+
+-- If there is no () part after the destination table name,
+-- all columns must be specified, either as * or by name.
+insert into t2 select * from t1;
+insert into t2 select c1, c2 from t1;
+
+-- With the () notation following the destination table name,
+-- you can omit columns (all values for that column are NULL
+-- in the destination table), and/or reorder the values
+-- selected from the source table. This is the "column permutation" feature.
+insert into t2 (c1) select c1 from t1;
+insert into t2 (c2, c1) select c1, c2 from t1;
+
+-- The column names can be entirely different in the source and destination 
tables.
+-- You can copy any columns, not just the corresponding ones, from the source 
table.
+-- But the number and type of selected columns must match the columns 
mentioned in the () part.
+alter table t2 replace columns (x int, y int);
+insert into t2 (y) select c1 from t1;
+
+-- For partitioned tables, all the partitioning columns must be mentioned in 
the () column list
+-- or a PARTITION clause; these columns cannot be defaulted to NULL.
+create table pt1 (x int, y int) partitioned by (z int);
+-- The values from c1 are copied into the column x in the new table,
+-- all in the same partition based on a constant value for z.
+-- The values of y in the new table are all NULL.
+insert into pt1 (x) partition (z=5) select c1 from t1;
+-- Again we omit the values for column y so they are all NULL.
+-- The inserted x values can go into different partitions, based on
+-- the different values inserted into the partitioning column z.
+insert into pt1 (x,z) select x, z from t2;
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">SELECT *</code> for a partitioned table requires 
that all partition key columns in the source table
+      be declared as the last columns in the <code class="ph codeph">CREATE 
TABLE</code> statement. You still include a
+      <code class="ph codeph">PARTITION BY</code> clause listing all the 
partition key columns. These partition columns are
+      automatically mapped to the last columns from the <code class="ph 
codeph">SELECT *</code> list.
+    </p>
+
+<pre class="pre codeblock"><code>create table source (x int, y int, year int, 
month int, day int);
+create table destination (x int, y int) partitioned by (year int, month int, 
day int);
+...load some data into the unpartitioned source table...
+-- Insert a single partition of data.
+-- The SELECT * means you cannot specify partition (year=2014, month, day).
+insert overwrite destination partition (year, month, day) select * from source 
where year=2014;
+-- Insert the data for all year/month/day combinations.
+insert overwrite destination partition (year, month, day) select * from source;
+
+-- If one of the partition columns is omitted from the source table,
+-- then you can specify a specific value for that column in the PARTITION 
clause.
+-- Here the source table holds only data from 2014, and so does not include a 
year column.
+create table source_2014 (x int, y int, month, day);
+...load some data into the unpartitioned source_2014 table...
+insert overwrite destination partition (year=2014, month, day) select * from 
source_2014;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can 
specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code 
class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT 
... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and 
therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Concurrency considerations:</strong> Each <code 
class="ph codeph">INSERT</code> operation creates new data files with unique
+      names, so you can run multiple <code class="ph codeph">INSERT 
INTO</code> statements simultaneously without filename
+      conflicts.
+
+      While data is being inserted into an Impala table, the data is staged 
temporarily in a subdirectory inside
+      the data directory; during this period, you cannot issue queries against 
that table in Hive. If an
+      <code class="ph codeph">INSERT</code> operation fails, the temporary 
data file and the subdirectory could be left behind in
+      the data directory. If so, remove the relevant subdirectory and any data 
files it contains manually, by
+      issuing an <code class="ph codeph">hdfs dfs -rm -r</code> command, 
specifying the full path of the work subdirectory, whose
+      name ends in <code class="ph codeph">_dir</code>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_langref_sql.html">Impala SQL 
Statements</a></div></div></nav><article class="topic concept nested1" 
aria-labelledby="ariaid-title2" id="insert__values">
+
+    <h2 class="title topictitle2" id="ariaid-title2">VALUES Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">VALUES</code> clause is a general-purpose 
way to specify the columns of one or more rows,
+        typically within an <code class="ph codeph"><a class="xref" 
href="impala_insert.html#insert">INSERT</a></code> statement.
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        The <code class="ph codeph">INSERT ... VALUES</code> technique is not 
suitable for loading large quantities of data into
+        HDFS-based tables, because the insert operations cannot be 
parallelized, and each one produces a separate
+        data file. Use it for setting up small dimension tables or tiny 
amounts of data for experimenting with SQL
+        syntax, or with HBase tables. Do not use it for large ETL jobs or 
benchmark tests for load operations. Do
+        not run scripts with thousands of <code class="ph codeph">INSERT ... 
VALUES</code> statements that insert a single row
+        each time. If you do run <code class="ph codeph">INSERT ... 
VALUES</code> operations to load data into a staging table as
+        one stage in an ETL pipeline, include multiple row values if possible 
within each <code class="ph codeph">VALUES</code>
+        clause, and use a separate database to make cleanup easier if the 
operation does produce many tiny files.
+      </div>
+
+      <p class="p">
+        The following examples illustrate:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          How to insert a single row using a <code class="ph 
codeph">VALUES</code> clause.
+        </li>
+
+        <li class="li">
+          How to insert multiple rows using a <code class="ph 
codeph">VALUES</code> clause.
+        </li>
+
+        <li class="li">
+          How the row or rows from a <code class="ph codeph">VALUES</code> 
clause can be appended to a table through
+          <code class="ph codeph">INSERT INTO</code>, or replace the contents 
of the table through <code class="ph codeph">INSERT
+          OVERWRITE</code>.
+        </li>
+
+        <li class="li">
+          How the entries in a <code class="ph codeph">VALUES</code> clause 
can be literals, function results, or any other kind
+          of expression. See <a class="xref" 
href="impala_literals.html#literals">Literals</a> for the notation to use for 
literal
+          values, especially <a class="xref" 
href="impala_literals.html#string_literals">String Literals</a> for quoting and 
escaping
+          conventions for strings. See <a class="xref" 
href="impala_operators.html#operators">SQL Operators</a> and
+          <a class="xref" href="impala_functions.html#builtins">Impala 
Built-In Functions</a> for other things you can include in expressions with the
+          <code class="ph codeph">VALUES</code> clause.
+        </li>
+      </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; describe val_example;
+Query: describe val_example
+Query finished, fetching results ...
++-------+---------+---------+
+| name  | type    | comment |
++-------+---------+---------+
+| id    | int     |         |
+| col_1 | boolean |         |
+| col_2 | double  |         |
++-------+---------+---------+
+
+[localhost:21000] &gt; insert into val_example values (1,true,100.0);
+Inserted 1 rows in 0.30s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------+
+| id | col_1 | col_2 |
++----+-------+-------+
+| 1  | true  | 100   |
++----+-------+-------+
+
+[localhost:21000] &gt; insert overwrite val_example values 
(10,false,pow(2,5)), (50,true,10/3);
+Inserted 2 rows in 0.16s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------------------+
+| id | col_1 | col_2             |
++----+-------+-------------------+
+| 10 | false | 32                |
+| 50 | true  | 3.333333333333333 |
++----+-------+-------------------+</code></pre>
+
+      <p class="p">
+        When used in an <code class="ph codeph">INSERT</code> statement, the 
Impala <code class="ph codeph">VALUES</code> clause can specify
+        some or all of the columns in the destination table, and the columns 
can be specified in a different order
+        than they actually appear in the table. To specify a different set or 
order of columns than in the table,
+        use the syntax:
+      </p>
+
+<pre class="pre codeblock"><code>INSERT INTO <var class="keyword 
varname">destination</var>
+  (<var class="keyword varname">col_x</var>, <var class="keyword 
varname">col_y</var>, <var class="keyword varname">col_z</var>)
+  VALUES
+  (<var class="keyword varname">val_x</var>, <var class="keyword 
varname">val_y</var>, <var class="keyword varname">val_z</var>);
+</code></pre>
+
+      <p class="p">
+        Any columns in the table that are not listed in the <code class="ph 
codeph">INSERT</code> statement are set to
+        <code class="ph codeph">NULL</code>.
+      </p>
+
+
+
+      <p class="p">
+        To use a <code class="ph codeph">VALUES</code> clause like a table in 
other statements, wrap it in parentheses and use
+        <code class="ph codeph">AS</code> clauses to specify aliases for the 
entire object and any columns you need to refer to:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select * from 
(values(4,5,6),(7,8,9)) as t;
++---+---+---+
+| 4 | 5 | 6 |
++---+---+---+
+| 4 | 5 | 6 |
+| 7 | 8 | 9 |
++---+---+---+
+[localhost:21000] &gt; select * from (values(1 as c1, true as c2, 'abc' as 
c3),(100,false,'xyz')) as t;
++-----+-------+-----+
+| c1  | c2    | c3  |
++-----+-------+-----+
+| 1   | true  | abc |
+| 100 | false | xyz |
++-----+-------+-----+</code></pre>
+
+      <p class="p">
+        For example, you might use a tiny table constructed like this from 
constant literals or function return
+        values as part of a longer statement involving joins or <code 
class="ph codeph">UNION ALL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+      <p class="p">
+        Impala physically writes all inserted files under the ownership of its 
default user, typically
+        <code class="ph codeph">impala</code>. Therefore, this user must have 
HDFS write permission in the corresponding table
+        directory.
+      </p>
+
+      <p class="p">
+        The permission requirement is independent of the authorization 
performed by the Sentry framework. (If the
+        connected user is not authorized to insert into a table, Sentry blocks 
that operation immediately,
+        regardless of the privileges available to the <code class="ph 
codeph">impala</code> user.) Files created by Impala are
+        not owned by and do not inherit permissions from the connected user.
+      </p>
+
+      <p class="p">
+        The number of data files produced by an <code class="ph 
codeph">INSERT</code> statement depends on the size of the
+        cluster, the number of data blocks that are processed, the partition 
key columns in a partitioned table,
+        and the mechanism Impala uses for dividing the work in parallel. Do 
not assume that an
+        <code class="ph codeph">INSERT</code> statement will produce some 
particular number of output files. In case of
+        performance issues with data written by Impala, check that the output 
files do not suffer from issues such
+        as many tiny files or many tiny partitions. (In the Hadoop context, 
even files or partitions of a few tens
+        of megabytes are considered <span class="q">"tiny"</span>.)
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">INSERT</code> statement has always left 
behind a hidden work directory inside the data
+        directory of the table. Formerly, this hidden work directory was named
+        <span class="ph filepath">.impala_insert_staging</span> . In Impala 
2.0.1 and later, this directory name is changed to
+        <span class="ph filepath">_impala_insert_staging</span> . (While HDFS 
tools are expected to treat names beginning
+        either with underscore and dot as hidden, in practice names beginning 
with an underscore are more widely
+        supported.) If you have any scripts, cleanup jobs, and so on that rely 
on the name of this work directory,
+        adjust them to use the new name.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+      <p class="p">
+        You can use the <code class="ph codeph">INSERT</code> statement with 
HBase tables as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            You can insert a single row or a small set of rows into an HBase 
table with the <code class="ph codeph">INSERT ...
+            VALUES</code> syntax. This is a good use case for HBase tables 
with Impala, because HBase tables are
+            not subject to the same kind of fragmentation from many small 
insert operations as HDFS tables are.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You can insert any number of rows at once into an HBase table 
using the <code class="ph codeph">INSERT ...
+            SELECT</code> syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If more than one inserted row has the same value for the HBase key 
column, only the last inserted row
+            with that value is visible to Impala queries. You can take 
advantage of this fact with <code class="ph codeph">INSERT
+            ... VALUES</code> statements to effectively update rows one at a 
time, by inserting new rows with the
+            same key values as existing rows. Be aware that after an <code 
class="ph codeph">INSERT ... SELECT</code> operation
+            copying from an HDFS table, the HBase table might contain fewer 
rows than were inserted, if the key
+            column in the source table contained duplicate values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You cannot <code class="ph codeph">INSERT OVERWRITE</code> into an 
HBase table. New rows are always appended.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you create an Impala or Hive table that maps to an HBase 
table, the column order you specify with
+            the <code class="ph codeph">INSERT</code> statement might be 
different than the order you declare with the
+            <code class="ph codeph">CREATE TABLE</code> statement. Behind the 
scenes, HBase arranges the columns based on how
+            they are divided into column families. This might cause a mismatch 
during insert operations, especially
+            if you use the syntax <code class="ph codeph">INSERT INTO <var 
class="keyword varname">hbase_table</var> SELECT * FROM
+            <var class="keyword varname">hdfs_table</var></code>. Before 
inserting data, verify the column order by issuing a
+            <code class="ph codeph">DESCRIBE</code> statement for the table, 
and adjust the order of the select list in the
+            <code class="ph codeph">INSERT</code> statement.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala 
to Query HBase Tables</a> for more details about using Impala with HBase.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML 
statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD 
DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write 
data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, 
because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> 
prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph 
codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms 
instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table 
before using Impala to query the S3 data.
+      </p>
+      <p class="p">
+        Because of differences between S3 and traditional filesystems, DML 
operations
+        for S3 tables can take longer than for tables on HDFS. For example, 
both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage 
of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements 
involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and 
<code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final 
destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> 
operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then 
removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph 
codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 
tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an 
inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> 
or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" 
href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING
 Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+      <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala 
with the Amazon S3 Filesystem</a> for details about reading and writing S3 data 
with Impala.</p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+      <p class="p">
+        If these statements in your environment contain sensitive literal 
values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when 
displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the 
documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Cancellation:</strong> Can be cancelled. To 
cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the 
<span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span 
class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph 
uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> 
daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have 
read
+        permission for the files in the source directory of an <code class="ph 
codeph">INSERT ... SELECT</code>
+        operation, and write permission for all affected directories in the 
destination table.
+        (An <code class="ph codeph">INSERT</code> operation could write files 
to multiple different HDFS directories
+        if the destination table is partitioned.)
+        This user must also have write permission to create a temporary work 
directory
+        in the top-level HDFS directory of the destination table.
+        An <code class="ph codeph">INSERT OVERWRITE</code> operation does not 
require write permission on
+        the original data files in the table, only on the table directories 
themselves.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        For <code class="ph codeph">INSERT</code> operations into <code 
class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> 
columns, you
+        must cast all <code class="ph codeph">STRING</code> literals or 
expressions returning <code class="ph codeph">STRING</code> to to a
+        <code class="ph codeph">CHAR</code> or <code class="ph 
codeph">VARCHAR</code> type with the appropriate length.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related startup options:</strong>
+      </p>
+
+      <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement 
creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for 
the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent 
directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup 
option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+    </div>
+  </article>
+
+
+</article></main></body></html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_install.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_install.html 
b/docs/build/html/topics/impala_install.html
new file mode 100644
index 0000000..561d3b4
--- /dev/null
+++ b/docs/build/html/topics/impala_install.html
@@ -0,0 +1,126 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta 
name="prodname" content="Impala"><meta name="version" content="Impala 
2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" 
content="XHTML"><meta name="DC.Identifier" content="install"><link 
rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installing 
Impala</title></head><body id="install"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Installing 
Impala</span></h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      
+      
+      
+      
+      
+      
+      Impala is an open-source analytic database for Apache Hadoop
+      that returns rapid responses to queries.
+    </p>
+
+    <p class="p">
+      Follow these steps to set up Impala on a cluster by building from source:
+    </p>
+
+
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Download the latest release. See
+          <a class="xref" href="http://impala.apache.org/downloads.html"; 
target="_blank">the Impala downloads page</a>
+          for the link to the latest release.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Check the <span class="ph filepath">README.md</span> file for a 
pointer
+          to the build instructions.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Please check the MD5 and SHA1 and GPG signature, the latter by using 
the code signing keys of the release managers.
+        </p>
+      </li>
+      <li class="li">
+        <div class="p">
+          Developers interested in working on Impala can clone the Impala 
source repository:
+<pre class="pre codeblock"><code>
+git clone https://git-wip-us.apache.org/repos/asf/incubator-impala.git
+</code></pre>
+        </div>
+      </li>
+    </ul>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" 
id="install__install_details">
+
+    <h2 class="title topictitle2" id="ariaid-title2">What is Included in an 
Impala Installation</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala is made up of a set of components that can be installed on 
multiple nodes throughout your cluster.
+        The key installation step for performance is to install the <span 
class="keyword cmdname">impalad</span> daemon (which does
+        most of the query processing work) on <em class="ph i">all</em> 
DataNodes in the cluster.
+      </p>
+
+      <p class="p">
+        The Impala package installs these binaries:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impalad</span> - The Impala daemon. 
Plans and executes queries against HDFS, HBase, <span class="ph">and Amazon S3 
data</span>.
+            <a class="xref" href="impala_processes.html#processes">Run one 
impalad process</a> on each node in the cluster
+            that has a DataNode.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">statestored</span> - Name service 
that tracks location and status of all
+            <code class="ph codeph">impalad</code> instances in the cluster. 
<a class="xref" href="impala_processes.html#processes">Run one
+            instance of this daemon</a> on a node in your cluster. Most 
production deployments run this daemon
+            on the namenode.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">catalogd</span> - Metadata 
coordination service that broadcasts changes from Impala DDL and
+            DML statements to all affected Impala nodes, so that new tables, 
newly loaded data, and so on are
+            immediately visible to queries submitted through any Impala node.
+
+            (Prior to Impala 1.2, you had to run the <code class="ph 
codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+            METADATA</code> statement on each node to synchronize changed 
metadata. Now those statements are only
+            required if you perform the DDL or DML through an external 
mechanism such as Hive <span class="ph">or by uploading
+            data to the Amazon S3 filesystem</span>.)
+            <a class="xref" href="impala_processes.html#processes">Run one 
instance of this daemon</a> on a node in your cluster,
+            preferably on the same host as the <code class="ph 
codeph">statestored</code> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impala-shell</span> - <a 
class="xref" href="impala_impala_shell.html#impala_shell">Command-line
+            interface</a> for issuing queries to the Impala daemon. You 
install this on one or more hosts
+            anywhere on your network, not necessarily DataNodes or even within 
the same cluster as Impala. It can
+            connect remotely to any instance of the Impala daemon.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        Before doing the installation, ensure that you have all necessary 
prerequisites. See
+        <a class="xref" href="impala_prereqs.html#prereqs">Impala 
Requirements</a> for details.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_int.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_int.html 
b/docs/build/html/topics/impala_int.html
new file mode 100644
index 0000000..2fcd403
--- /dev/null
+++ b/docs/build/html/topics/impala_int.html
@@ -0,0 +1,119 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_datatypes.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="int"><link 
rel="stylesheet" type="text/css" href="../commonltr.css"><title>INT Data 
Type</title></head><body id="int"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INT Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A 4-byte integer data type used in <code class="ph codeph">CREATE 
TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE 
TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword 
varname">column_name</var> INT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -2147483648 .. 2147483647. There is 
no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts 
to a larger integer type (<code class="ph codeph">BIGINT</code>) or a
+      floating-point type (<code class="ph codeph">FLOAT</code> or <code 
class="ph codeph">DOUBLE</code>) automatically. Use
+      <code class="ph codeph">CAST()</code> to convert to <code class="ph 
codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+      <code class="ph codeph">STRING</code>, or <code class="ph 
codeph">TIMESTAMP</code>.
+      <span class="ph">Casting an integer or floating-point value <code 
class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is 
<code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date 
and time in the UTC time zone.
+        If the setting <code class="ph 
codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a 
date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The data type <code class="ph codeph">INTEGER</code> is an alias for 
<code class="ph codeph">INT</code>.
+    </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code 
class="ph codeph">INT</code> type, call the functions
+      <code class="ph codeph">MIN_INT()</code> and <code class="ph 
codeph">MAX_INT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph 
codeph">INT</code>, use a <code class="ph codeph">BIGINT</code>
+      instead.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any 
non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x INT);
+SELECT CAST(1000 AS INT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type 
for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> 
representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is 
fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of 
this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory 
as a 4-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of 
Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> 
Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run 
the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric 
Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data 
Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data 
Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, 
<a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 
1.4 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala 
Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_datatypes.html">Data 
Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_intro.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_intro.html 
b/docs/build/html/topics/impala_intro.html
new file mode 100644
index 0000000..cdf05b7
--- /dev/null
+++ b/docs/build/html/topics/impala_intro.html
@@ -0,0 +1,198 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta 
name="version" content="Impala 2.8.x"><meta name="DC.Format" 
content="XHTML"><meta name="DC.Identifier" content="intro"><link 
rel="stylesheet" type="text/css" href="../commonltr.css"><title>Introducing 
Apache Impala (incubating)</title></head><body id="intro"><main 
role="main"><article role="article" aria-labelledby="intro__impala">
+
+  <h1 class="title topictitle1" id="intro__impala"><span 
class="ph">Introducing Apache Impala (incubating)</span></h1>
+  
+
+  <div class="body conbody" id="intro__intro_body">
+
+      <p class="p">
+        Impala provides fast, interactive SQL queries directly on your Apache 
Hadoop data stored in HDFS,
+        HBase, <span class="ph">or the Amazon Simple Storage Service 
(S3)</span>.
+        In addition to using the same unified storage platform,
+        Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC 
driver, and user interface
+        (Impala query UI in Hue) as Apache Hive. This
+        provides a familiar and unified platform for real-time or 
batch-oriented queries.
+      </p>
+
+      <p class="p">
+        Impala is an addition to tools available for querying big data. Impala 
does not replace the batch
+        processing frameworks built on MapReduce such as Hive. Hive and other 
frameworks built on MapReduce are
+        best suited for long running batch jobs, such as those involving batch 
processing of Extract, Transform,
+        and Load (ETL) type jobs.
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        Impala was accepted into the Apache incubator on December 2, 2015.
+        In places where the documentation formerly referred to <span 
class="q">"Cloudera Impala"</span>,
+        now the official name is <span class="q">"Apache Impala 
(incubating)"</span>.
+      </div>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" 
id="intro__benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impala Benefits</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        Impala provides:
+
+        <ul class="ul">
+          <li class="li">
+            Familiar SQL interface that data scientists and analysts already 
know.
+          </li>
+
+          <li class="li">
+            Ability to query high volumes of data (<span class="q">"big 
data"</span>) in Apache Hadoop.
+          </li>
+
+          <li class="li">
+            Distributed queries in a cluster environment, for convenient 
scaling and to make use of cost-effective
+            commodity hardware.
+          </li>
+
+          <li class="li">
+            Ability to share data files between different components with no 
copy or export/import step; for example,
+            to write with Pig, transform with Hive and query with Impala. 
Impala can read from and write to Hive
+            tables, enabling simple data interchange using Impala for 
analytics on Hive-produced data.
+          </li>
+
+          <li class="li">
+            Single system for big data processing and analytics, so customers 
can avoid costly modeling and ETL just
+            for analytics.
+          </li>
+        </ul>
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="intro__impala_hadoop">
+
+    <h2 class="title topictitle2" id="ariaid-title3">How Impala Works with 
<span class="keyword">Apache Hadoop</span></h2>
+  
+
+    <div class="body conbody">
+
+      
+
+      <div class="p">
+        The Impala solution is composed of the following components:
+        <ul class="ul">
+          <li class="li">
+            Clients - Entities including Hue, ODBC clients, JDBC clients, and 
the Impala Shell can all interact
+            with Impala. These interfaces are typically used to issue queries 
or complete administrative tasks such
+            as connecting to Impala.
+          </li>
+
+          <li class="li">
+            Hive Metastore - Stores information about the data available to 
Impala. For example, the metastore lets
+            Impala know what databases are available and what the structure of 
those databases is. As you create,
+            drop, and alter schema objects, load data into tables, and so on 
through Impala SQL statements, the
+            relevant metadata changes are automatically broadcast to all 
Impala nodes by the dedicated catalog
+            service introduced in Impala 1.2.
+          </li>
+
+          <li class="li">
+            Impala - This process, which runs on DataNodes, coordinates and 
executes queries. Each
+            instance of Impala can receive, plan, and coordinate queries from 
Impala clients. Queries are
+            distributed among Impala nodes, and these nodes then act as 
workers, executing parallel query
+            fragments.
+          </li>
+
+          <li class="li">
+            HBase and HDFS - Storage for data to be queried.
+          </li>
+        </ul>
+      </div>
+
+      <div class="p">
+        Queries executed using Impala are handled as follows:
+        <ol class="ol">
+          <li class="li">
+            User applications send SQL queries to Impala through ODBC or JDBC, 
which provide standardized querying
+            interfaces. The user application may connect to any <code 
class="ph codeph">impalad</code> in the cluster. This
+            <code class="ph codeph">impalad</code> becomes the coordinator for 
the query.
+          </li>
+
+          <li class="li">
+            Impala parses the query and analyzes it to determine what tasks 
need to be performed by
+            <code class="ph codeph">impalad</code> instances across the 
cluster. Execution is planned for optimal efficiency.
+          </li>
+
+          <li class="li">
+            Services such as HDFS and HBase are accessed by local <code 
class="ph codeph">impalad</code> instances to provide
+            data.
+          </li>
+
+          <li class="li">
+            Each <code class="ph codeph">impalad</code> returns data to the 
coordinating <code class="ph codeph">impalad</code>, which sends
+            these results to the client.
+          </li>
+        </ol>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" 
id="intro__features">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Primary Impala 
Features</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        Impala provides support for:
+        <ul class="ul">
+          <li class="li">
+            Most common SQL-92 features of Hive Query Language (HiveQL) 
including
+            <a class="xref" 
href="../shared/../topics/impala_select.html#select">SELECT</a>,
+            <a class="xref" 
href="../shared/../topics/impala_joins.html#joins">joins</a>, and
+            <a class="xref" 
href="../shared/../topics/impala_aggregate_functions.html#aggregate_functions">aggregate
 functions</a>.
+          </li>
+
+          <li class="li">
+            HDFS, HBase, <span class="ph">and Amazon Simple Storage System 
(S3)</span> storage, including:
+            <ul class="ul">
+              <li class="li">
+                <a class="xref" 
href="../shared/../topics/impala_file_formats.html#file_formats">HDFS file 
formats</a>: delimited text files, Parquet,
+                Avro, SequenceFile, and RCFile.
+              </li>
+
+              <li class="li">
+                Compression codecs: Snappy, GZIP, Deflate, BZIP.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            Common data access interfaces including:
+            <ul class="ul">
+              <li class="li">
+                <a class="xref" 
href="../shared/../topics/impala_jdbc.html#impala_jdbc">JDBC driver</a>.
+              </li>
+
+              <li class="li">
+                <a class="xref" 
href="../shared/../topics/impala_odbc.html#impala_odbc">ODBC driver</a>.
+              </li>
+
+              <li class="li">
+                Hue Beeswax and the Impala Query UI.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            <a class="xref" 
href="../shared/../topics/impala_impala_shell.html#impala_shell">impala-shell 
command-line interface</a>.
+          </li>
+
+          <li class="li">
+            <a class="xref" 
href="../shared/../topics/impala_security.html#security">Kerberos 
authentication</a>.
+          </li>
+        </ul>
+      </div>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_invalidate_metadata.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_invalidate_metadata.html 
b/docs/build/html/topics/impala_invalidate_metadata.html
new file mode 100644
index 0000000..c06b9c9
--- /dev/null
+++ b/docs/build/html/topics/impala_invalidate_metadata.html
@@ -0,0 +1,294 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_langref_sql.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="invalidate_metadata"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>INVALIDATE METADATA 
Statement</title></head><body id="invalidate_metadata"><main 
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INVALIDATE METADATA 
Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Marks the metadata for one or all tables as stale. Required after a 
table is created through the Hive shell,
+      before the table is available for Impala queries. The next time the 
current Impala node performs a query
+      against a table whose metadata is invalidated, Impala reloads the 
associated metadata before the query
+      proceeds. This is a relatively expensive operation compared to the 
incremental metadata update done by the
+      <code class="ph codeph">REFRESH</code> statement, so in the common 
scenario of adding new data files to an existing table,
+      prefer <code class="ph codeph">REFRESH</code> rather than <code 
class="ph codeph">INVALIDATE METADATA</code>. If you are not familiar
+      with the way Impala uses metadata and how it shares the same metastore 
database as Hive, see
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of 
Impala Metadata and the Metastore</a> for background information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>INVALIDATE METADATA [[<var class="keyword 
varname">db_name</var>.]<var class="keyword 
varname">table_name</var>]</code></pre>
+
+    <p class="p">
+      By default, the cached metadata for all tables is flushed. If you 
specify a table name, only the metadata for
+      that one table is flushed. Even for a single table, <code class="ph 
codeph">INVALIDATE METADATA</code> is more expensive
+      than <code class="ph codeph">REFRESH</code>, so prefer <code class="ph 
codeph">REFRESH</code> in the common case where you add new data
+      files for an existing table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      To accurately respond to queries, Impala must have current metadata 
about those databases and tables that
+      clients query directly. Therefore, if some other entity modifies 
information used by Impala in the metastore
+      that Impala and Hive share, the information cached by Impala must be 
updated. However, this does not mean
+      that all metadata updates require an Impala update.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      <p class="p">
+        In Impala 1.2.4 and higher, you can specify a table name with <code 
class="ph codeph">INVALIDATE METADATA</code> after
+        the table is created in Hive, allowing you to make individual tables 
visible to Impala without doing a full
+        reload of the catalog metadata. Impala 1.2.4 also includes other 
changes to make the metadata broadcast
+        mechanism faster and more responsive, especially during Impala 
startup. See
+        <a class="xref" 
href="../shared/../topics/impala_new_features.html#new_features_124">New 
Features in Impala 1.2.4</a> for details.
+      </p>
+      <p class="p">
+        In Impala 1.2 and higher, a dedicated daemon (<span class="keyword 
cmdname">catalogd</span>) broadcasts DDL changes made
+        through Impala to all Impala nodes. Formerly, after you created a 
database or table while connected to one
+        Impala node, you needed to issue an <code class="ph codeph">INVALIDATE 
METADATA</code> statement on another Impala node
+        before accessing the new database or table from the other node. Now, 
newly created or altered objects are
+        picked up automatically by all Impala nodes. You must still use the 
<code class="ph codeph">INVALIDATE METADATA</code>
+        technique after creating or altering objects through Hive. See
+        <a class="xref" href="impala_components.html#intro_catalogd">The 
Impala Catalog Service</a> for more information on the catalog service.
+      </p>
+      <p class="p">
+        The <code class="ph codeph">INVALIDATE METADATA</code> statement is 
new in Impala 1.1 and higher, and takes over some of
+        the use cases of the Impala 1.0 <code class="ph codeph">REFRESH</code> 
statement. Because <code class="ph codeph">REFRESH</code> now
+        requires a table name parameter, to flush the metadata for all tables 
at once, use the <code class="ph codeph">INVALIDATE
+        METADATA</code> statement.
+      </p>
+      <p class="p">
+      Because <code class="ph codeph">REFRESH <var class="keyword 
varname">table_name</var></code> only works for tables that the current
+      Impala node is already aware of, when you create a new table in the Hive 
shell, enter
+      <code class="ph codeph">INVALIDATE METADATA <var class="keyword 
varname">new_table</var></code> before you can see the new table in
+      <span class="keyword cmdname">impala-shell</span>. Once the table is 
known by Impala, you can issue <code class="ph codeph">REFRESH
+      <var class="keyword varname">table_name</var></code> after you add data 
files for that table.
+    </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph 
codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+      METADATA</code> waits to reload the metadata when needed for a 
subsequent query, but reloads all the
+      metadata for the table, which can be an expensive operation, especially 
for large tables with many
+      partitions. <code class="ph codeph">REFRESH</code> reloads the metadata 
immediately, but only loads the block location
+      data for newly added data files, making it a less expensive operation 
overall. If data was altered in some
+      more extensive way, such as being reorganized by the HDFS balancer, use 
<code class="ph codeph">INVALIDATE
+      METADATA</code> to avoid a performance penalty from reduced local reads. 
If you used Impala version 1.0,
+      the <code class="ph codeph">INVALIDATE METADATA</code> statement works 
just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+      statement did, while the Impala 1.1 <code class="ph 
codeph">REFRESH</code> is optimized for the common use case of adding
+      new data files to an existing table, thus the table name argument is now 
required.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A metadata update for an <code class="ph codeph">impalad</code> instance 
<strong class="ph b">is</strong> required if:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A metadata change occurs.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made from another 
<code class="ph codeph">impalad</code> instance in your cluster, or through
+        Hive.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made to a metastore 
database to which clients such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p class="p">
+      A metadata update for an Impala node is <strong class="ph 
b">not</strong> required when you issue queries from the same Impala node
+      where you ran <code class="ph codeph">ALTER TABLE</code>, <code 
class="ph codeph">INSERT</code>, or other table-modifying statement.
+    </p>
+
+    <p class="p">
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Hive - via <code class="ph codeph">ALTER</code>, <code class="ph 
codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+        <code class="ph codeph">INSERT</code> operations.
+      </li>
+
+      <li class="li">
+        Impalad - via <code class="ph codeph">CREATE TABLE</code>, <code 
class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+        operations.
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> causes the metadata 
for that table to be marked as stale, and reloaded
+      the next time the table is referenced. For a huge table, that process 
could take a noticeable amount of time;
+      thus you might prefer to use <code class="ph codeph">REFRESH</code> 
where practical, to avoid an unpredictable delay later,
+      for example if the next reference to the table is during a benchmark 
test.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you might use the <code class="ph 
codeph">INVALIDATE METADATA</code> statement after
+      creating new tables (such as SequenceFile or HBase tables) through the 
Hive shell. Before the
+      <code class="ph codeph">INVALIDATE METADATA</code> statement was issued, 
Impala would give a <span class="q">"table not found"</span> error
+      if you tried to refer to those table names. The <code class="ph 
codeph">DESCRIBE</code> statements cause the latest
+      metadata to be immediately loaded for the tables, avoiding a delay the 
next time those tables are queried.
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; invalidate metadata;
+[impalad-host:21000] &gt; describe t1;
+...
+[impalad-host:21000] &gt; describe t2;
+... </code></pre>
+
+    <p class="p">
+      For more examples of using <code class="ph codeph">REFRESH</code> and 
<code class="ph codeph">INVALIDATE METADATA</code> with a
+      combination of Impala and Hive operations, see <a class="xref" 
href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth 
Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+      If you need to ensure that the metadata is up-to-date when you start an 
<span class="keyword cmdname">impala-shell</span>
+      session, run <span class="keyword cmdname">impala-shell</span> with the 
<code class="ph codeph">-r</code> or
+      <code class="ph codeph">--refresh_after_connect</code> command-line 
option. Because this operation adds a delay to the next
+      query against each table, potentially expensive for large tables with 
many partitions, try to avoid using
+      this option for day-to-day operations in a production environment.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon 
runs under,
+      typically the <code class="ph codeph">impala</code> user, must have 
execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <code class="ph codeph">LOCATION</code> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this 
statement,
+      but subsequent statements such as <code class="ph codeph">SELECT</code>
+      or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      By default, the <code class="ph codeph">INVALIDATE METADATA</code> 
command checks HDFS permissions of the underlying data
+      files and directories, caching this information so that a statement can 
be cancelled immediately if for
+      example the <code class="ph codeph">impala</code> user does not have 
permission to write to the data directory for the
+      table. (This checking does not apply if you have set the <span 
class="keyword cmdname">catalogd</span> configuration option
+      <code class="ph codeph">--load_catalog_in_background=false</code>.) 
Impala reports any lack of write permissions as an
+      <code class="ph codeph">INFO</code> message in the log file, in case 
that represents an oversight. If you change HDFS
+      permissions to make data readable or writeable by the Impala user, issue 
another <code class="ph codeph">INVALIDATE
+      METADATA</code> to make Impala aware of the change.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This example illustrates creating a new database and new table in Hive, 
then doing an <code class="ph codeph">INVALIDATE
+      METADATA</code> statement in Impala using the fully qualified table 
name, after which both the new table
+      and the new database are visible to Impala. The ability to specify <code 
class="ph codeph">INVALIDATE METADATA
+      <var class="keyword varname">table_name</var></code> for a table created 
in Hive is a new capability in Impala 1.2.4. In
+      earlier releases, that statement would have returned an error indicating 
an unknown table, requiring you to
+      do <code class="ph codeph">INVALIDATE METADATA</code> with no table 
name, a more expensive operation that reloaded metadata
+      for all tables and databases.
+    </p>
+
+<pre class="pre codeblock"><code>$ hive
+hive&gt; create database new_db_from_hive;
+OK
+Time taken: 4.118 seconds
+hive&gt; create table new_db_from_hive.new_table_from_hive (x int);
+OK
+Time taken: 0.618 seconds
+hive&gt; quit;
+$ impala-shell
+[localhost:21000] &gt; show databases like 'new*';
+[localhost:21000] &gt; refresh new_db_from_hive.new_table_from_hive;
+ERROR: AnalysisException: Database does not exist: new_db_from_hive
+[localhost:21000] &gt; invalidate metadata 
new_db_from_hive.new_table_from_hive;
+[localhost:21000] &gt; show databases like 'new*';
++--------------------+
+| name               |
++--------------------+
+| new_db_from_hive   |
++--------------------+
+[localhost:21000] &gt; show tables in new_db_from_hive;
++---------------------+
+| name                |
++---------------------+
+| new_table_from_hive |
++---------------------+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph 
codeph">INVALIDATE METADATA</code> statements also cache metadata
+        for tables where the data resides in the Amazon Simple Storage Service 
(S3).
+        In particular, issue a <code class="ph codeph">REFRESH</code> for a 
table after adding or removing files
+        in the associated S3 data directory.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using 
Impala with the Amazon S3 Filesystem</a> for details about working with S3 
tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph 
codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword 
varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword 
varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of 
Impala Metadata and the Metastore</a>,
+      <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_langref_sql.html">Impala SQL 
Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[29/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Reply via email to