[24/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:43 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_insert.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_insert.xml b/docs/topics/impala_insert.xml
new file mode 100644
index 0000000..02ad8c6
--- /dev/null
+++ b/docs/topics/impala_insert.xml
@@ -0,0 +1,689 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="insert">
+
+  <title>INSERT Statement</title>
+  <titlealts audience="PDF"><navtitle>INSERT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="DML"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="S3"/>
+      <!-- <data name="Category" value="Kudu"/> -->
+      <!-- This is such an important statement, think if there are more 
applicable categories. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">INSERT statement</indexterm>
+      Impala supports inserting into tables and partitions that you create 
with the Impala <codeph>CREATE
+      TABLE</codeph> statement, or pre-defined tables and partitions created 
through Hive.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>[<varname>with_clause</varname>]
+INSERT { INTO | OVERWRITE } [TABLE] <varname>table_name</varname>
+  [(<varname>column_list</varname>)]
+  [ PARTITION (<varname>partition_clause</varname>)]
+{
+    [<varname>hint_clause</varname>] <varname>select_statement</varname>
+  | VALUES (<varname>value</varname> [, <varname>value</varname> ...]) [, 
(<varname>value</varname> [, <varname>value</varname> ...]) ...]
+}
+
+partition_clause ::= <varname>col_name</varname> [= 
<varname>constant</varname>] [, <varname>col_name</varname> [= 
<varname>constant</varname>] ...]
+
+hint_clause ::= [SHUFFLE] | [NOSHUFFLE]    (Note: the square brackets are part 
of the syntax.)
+</codeblock>
+
+    <p>
+      <b>Appending or replacing (INTO and OVERWRITE clauses):</b>
+    </p>
+
+    <p>
+      The <codeph>INSERT INTO</codeph> syntax appends data to a table. The 
existing data files are left as-is, and
+      the inserted data is put into one or more new data files.
+    </p>
+
+    <p>
+      The <codeph>INSERT OVERWRITE</codeph> syntax replaces the data in a 
table.
+<!-- What happens with INSERT OVERWRITE if the target is a single partition or 
multiple partitions? -->
+<!-- If that gets too detailed, cover later under "Partitioning 
Considerations". -->
+      Currently, the overwritten data files are deleted immediately; they do 
not go through the HDFS trash
+      mechanism.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      The <codeph>INSERT</codeph> statement currently does not support writing 
data files
+      containing complex types (<codeph>ARRAY</codeph>, 
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>).
+      To prepare Parquet data for such tables, you generate the data files 
outside Impala and then
+      use <codeph>LOAD DATA</codeph> or <codeph>CREATE EXTERNAL TABLE</codeph> 
to associate those
+      data files with the table. Currently, such tables must use the Parquet 
file format.
+      See <xref href="impala_complex_types.xml#complex_types"/> for details 
about working with complex types.
+    </p>
+
+    <p rev="kudu">
+      <b>Ignoring duplicate partition keys for Kudu tables (IGNORE clause)</b>
+    </p>
+
+    <p rev="kudu">
+      Normally, an <codeph>INSERT</codeph> operation into a Kudu table fails if
+      it would result in duplicate partition key columns for any rows.
+      Specify <codeph>INSERT IGNORE 
<varname>rest_of_statement</varname></codeph> to
+      make the <codeph>INSERT</codeph> continue in this case. The rows that 
would
+      have duplicate partition key columns are not inserted.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Impala currently supports:
+    </p>
+
+    <ul>
+      <li>
+        Copy data from another table using <codeph>SELECT</codeph> query. In 
Impala 1.2.1 and higher, you can
+        combine <codeph>CREATE TABLE</codeph> and <codeph>INSERT</codeph> 
operations into a single step with the
+        <codeph>CREATE TABLE AS SELECT</codeph> syntax, which bypasses the 
actual <codeph>INSERT</codeph> keyword.
+      </li>
+
+      <li>
+        An optional <xref href="impala_with.xml#with"><codeph>WITH</codeph> 
clause</xref> before the
+        <codeph>INSERT</codeph> keyword, to define a subquery referenced in 
the <codeph>SELECT</codeph> portion.
+      </li>
+
+      <li>
+        Create one or more new rows using constant expressions through 
<codeph>VALUES</codeph> clause. (The
+        <codeph>VALUES</codeph> clause was added in Impala 1.0.1.)
+      </li>
+
+      <li rev="1.1">
+        <p>
+          By default, the first column of each newly inserted row goes into 
the first column of the table, the
+          second column into the second column, and so on.
+        </p>
+        <p>
+          You can also specify the columns to be inserted, an arbitrarily 
ordered subset of the columns in the
+          destination table, by specifying a column list immediately after the 
name of the destination table. This
+          feature lets you adjust the inserted columns to match the layout of 
a <codeph>SELECT</codeph> statement,
+          rather than the other way around. (This feature was added in Impala 
1.1.)
+        </p>
+        <p>
+          The number of columns mentioned in the column list (known as the 
<q>column permutation</q>) must match
+          the number of columns in the <codeph>SELECT</codeph> list or the 
<codeph>VALUES</codeph> tuples. The
+          order of columns in the column permutation can be different than in 
the underlying table, and the columns
+          of each input row are reordered to match. If the number of columns 
in the column permutation is less than
+          in the destination table, all unmentioned columns are set to 
<codeph>NULL</codeph>.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          For a partitioned table, the optional <codeph>PARTITION</codeph> 
clause identifies which partition or
+          partitions the new values go into. If a partition key column is 
given a constant value such as
+          <codeph>PARTITION (year=2012)</codeph> or <codeph>PARTITION 
(year=2012, month=2)</codeph>, all the
+          inserted rows use those same values for those partition key columns 
and you omit any corresponding
+          columns in the source table from the <codeph>SELECT</codeph> list. 
This form is known as <q>static
+          partitioning</q>.
+        </p>
+        <p>
+          If a partition key column is mentioned but not assigned a value, 
such as in <codeph>PARTITION (year,
+          region)</codeph> (both columns unassigned) or 
<codeph>PARTITION(year, region='CA')</codeph>
+          (<codeph>year</codeph> column unassigned), the unassigned columns 
are filled in with the final columns of
+          the <codeph>SELECT</codeph> list. In this case, the number of 
columns in the <codeph>SELECT</codeph> list
+          must equal the number of columns in the column permutation plus the 
number of partition key columns not
+          assigned a constant value. This form is known as <q>dynamic 
partitioning</q>.
+        </p>
+        <p>
+          See <xref href="impala_partitioning.xml#partition_static_dynamic"/> 
for examples and performance
+          characteristics of static and dynamic partitioned inserts.
+        </p>
+      </li>
+
+      <li rev="1.2.2">
+        An optional hint clause immediately before the <codeph>SELECT</codeph> 
keyword, to fine-tune the behavior
+        when doing an <codeph>INSERT ... SELECT</codeph> operation into 
partitioned Parquet tables. The hint
+        keywords are <codeph>[SHUFFLE]</codeph> and 
<codeph>[NOSHUFFLE]</codeph>, including the square brackets.
+        Inserting into partitioned Parquet tables can be a resource-intensive 
operation because it potentially
+        involves many files being written to HDFS simultaneously, and separate
+        <ph rev="parquet_block_size">large</ph> memory buffers being allocated 
to buffer the data for each
+        partition. For usage details, see <xref 
href="impala_parquet.xml#parquet_etl"/>.
+      </li>
+    </ul>
+
+    <note>
+      <ul>
+        <li>
+          Insert commands that partition or add files result in changes to 
Hive metadata. Because Impala uses Hive
+          metadata, such changes may necessitate a metadata refresh. For more 
information, see the
+          <xref href="impala_refresh.xml#refresh">REFRESH</xref> function.
+        </li>
+
+        <li>
+          Currently, Impala can only insert data into tables that use the text 
and Parquet formats. For other file
+          formats, insert the data using Hive and use Impala to query it.
+        </li>
+
+        <li>
+          As an alternative to the <codeph>INSERT</codeph> statement, if you 
have existing data files elsewhere in
+          HDFS, the <codeph>LOAD DATA</codeph> statement can move those files 
into a table. This statement works
+          with tables of any file format.
+        </li>
+      </ul>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/dml_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      When you insert the results of an expression, particularly of a built-in 
function call, into a small numeric
+      column such as <codeph>INT</codeph>, <codeph>SMALLINT</codeph>, 
<codeph>TINYINT</codeph>, or
+      <codeph>FLOAT</codeph>, you might need to use a <codeph>CAST()</codeph> 
expression to coerce values into the
+      appropriate type. Impala does not automatically convert from a larger 
type to a smaller one. For example, to
+      insert cosine values into a <codeph>FLOAT</codeph> column, write 
<codeph>CAST(COS(angle) AS FLOAT)</codeph>
+      in the <codeph>INSERT</codeph> statement to make the conversion explicit.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/file_format_blurb"/>
+
+    <p rev="DOCS-1523">
+      Because Impala can read certain file formats that it cannot write,
+      the <codeph>INSERT</codeph> statement does not work for all kinds of
+      Impala tables. See <xref href="impala_file_formats.xml#file_formats"/>
+      for details about what file formats are supported by the
+      <codeph>INSERT</codeph> statement.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/insert_parquet_blocksize"/>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example sets up new tables with the same definition as the 
<codeph>TAB1</codeph> table from the
+      <xref href="impala_tutorial.xml#tutorial">Tutorial</xref> section, using 
different file
+      formats, and demonstrates inserting data into the tables created with 
the <codeph>STORED AS TEXTFILE</codeph>
+      and <codeph>STORED AS PARQUET</codeph> clauses:
+    </p>
+
+<codeblock>CREATE DATABASE IF NOT EXISTS file_formats;
+USE file_formats;
+
+DROP TABLE IF EXISTS text_table;
+CREATE TABLE text_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS TEXTFILE;
+
+DROP TABLE IF EXISTS parquet_table;
+CREATE TABLE parquet_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS PARQUET;</codeblock>
+
+    <p>
+      With the <codeph>INSERT INTO TABLE</codeph> syntax, each new set of 
inserted rows is appended to any existing
+      data in the table. This is how you would record small amounts of data 
that arrive continuously, or ingest new
+      batches of data alongside the existing data. For example, after running 
2 <codeph>INSERT INTO TABLE</codeph>
+      statements with 5 rows each, the table contains 10 rows total:
+    </p>
+
+<codeblock>[localhost:21000] &gt; insert into table text_table select * from 
default.tab1;
+Inserted 5 rows in 0.41s
+
+[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.46s
+
+[localhost:21000] &gt; select count(*) from text_table;
++----------+
+| count(*) |
++----------+
+| 10       |
++----------+
+Returned 1 row(s) in 0.26s</codeblock>
+
+    <p>
+      With the <codeph>INSERT OVERWRITE TABLE</codeph> syntax, each new set of 
inserted rows replaces any existing
+      data in the table. This is how you load data to query in a data 
warehousing scenario where you analyze just
+      the data for a particular day, quarter, and so on, discarding the 
previous data each time. You might keep the
+      entire set of data in one raw table, and transfer and transform certain 
rows into a more compact and
+      efficient form to perform intensive analysis on that subset.
+    </p>
+
+    <p>
+      For example, here we insert 5 rows into a table using the <codeph>INSERT 
INTO</codeph> clause, then replace
+      the data by inserting 3 rows with the <codeph>INSERT OVERWRITE</codeph> 
clause. Afterward, the table only
+      contains the 3 rows from the final <codeph>INSERT</codeph> statement.
+    </p>
+
+<codeblock>[localhost:21000] &gt; insert into table parquet_table select * 
from default.tab1;
+Inserted 5 rows in 0.35s
+
+[localhost:21000] &gt; insert overwrite table parquet_table select * from 
default.tab1 limit 3;
+Inserted 3 rows in 0.43s
+[localhost:21000] &gt; select count(*) from parquet_table;
++----------+
+| count(*) |
++----------+
+| 3        |
++----------+
+Returned 1 row(s) in 0.43s</codeblock>
+
+    <p>
+      The <codeph><xref href="impala_insert.xml#values">VALUES</xref></codeph> 
clause lets you insert one or more
+      rows by specifying constant values for all the columns. The number, 
types, and order of the expressions must
+      match the table definition.
+    </p>
+
+    <note id="insert_values_warning">
+      The <codeph>INSERT ... VALUES</codeph> technique is not suitable for 
loading large quantities of data into
+      HDFS-based tables, because the insert operations cannot be parallelized, 
and each one produces a separate
+      data file. Use it for setting up small dimension tables or tiny amounts 
of data for experimenting with SQL
+      syntax, or with HBase tables. Do not use it for large ETL jobs or 
benchmark tests for load operations. Do not
+      run scripts with thousands of <codeph>INSERT ... VALUES</codeph> 
statements that insert a single row each
+      time. If you do run <codeph>INSERT ... VALUES</codeph> operations to 
load data into a staging table as one
+      stage in an ETL pipeline, include multiple row values if possible within 
each <codeph>VALUES</codeph> clause,
+      and use a separate database to make cleanup easier if the operation does 
produce many tiny files.
+    </note>
+
+    <p>
+      The following example shows how to insert one row or multiple rows, with 
expressions of different types,
+      using literal values, expressions, and function return values:
+    </p>
+
+<codeblock>create table val_test_1 (c1 int, c2 float, c3 string, c4 boolean, 
c5 timestamp);
+insert into val_test_1 values (100, 99.9/10, 'abc', true, now());
+create table val_test_2 (id int, token string);
+insert overwrite val_test_2 values (1, 'a'), (2, 'b'), 
(-1,'xyzzy');</codeblock>
+
+    <p>
+      These examples show the type of <q>not implemented</q> error that you 
see when attempting to insert data into
+      a table with a file format that Impala currently does not write to:
+    </p>
+
+<codeblock>DROP TABLE IF EXISTS sequence_table;
+CREATE TABLE sequence_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS SEQUENCEFILE;
+
+DROP TABLE IF EXISTS rc_table;
+CREATE TABLE rc_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS RCFILE;
+
+[localhost:21000] &gt; insert into table rc_table select * from default.tab1;
+Remote error
+Backend 0:RC_FILE not implemented.
+
+[localhost:21000] &gt; insert into table sequence_table select * from 
default.tab1;
+Remote error
+Backend 0:SEQUENCE_FILE not implemented. </codeblock>
+
+    <p>
+      Inserting data into partitioned tables requires slightly different 
syntax that divides the partitioning
+      columns from the others:
+    </p>
+
+<codeblock>create table t1 (i int) <b>partitioned by (x int, y string)</b>;
+-- Select an INT column from another table.
+-- All inserted rows will have the same x and y values, as specified in the 
INSERT statement.
+-- This technique of specifying all the partition key values is known as 
static partitioning.
+insert into t1 <b>partition(x=10, y='a')</b> select c1 from some_other_table;
+-- Select two INT columns from another table.
+-- All inserted rows will have the same y value, as specified in the INSERT 
statement.
+-- Values from c2 go into t1.x.
+-- Any partitioning columns whose value is not specified are filled in
+-- from the columns specified last in the SELECT list.
+-- This technique of omitting some partition key values is known as dynamic 
partitioning.
+insert into t1 <b>partition(x, y='b')</b> select c1, c2 from some_other_table;
+-- Select an INT and a STRING column from another table.
+-- All inserted rows will have the same x value, as specified in the INSERT 
statement.
+-- Values from c3 go into t1.y.
+insert into t1 <b>partition(x=20, y)</b> select c1, c3  from 
some_other_table;</codeblock>
+
+    <p rev="1.1">
+      The following examples show how you can copy the data in all the columns 
from one table to another, copy the
+      data from only some columns, or specify the columns in the select list 
in a different order than they
+      actually appear in the table:
+    </p>
+
+<codeblock>-- Start with 2 identical tables.
+create table t1 (c1 int, c2 int);
+create table t2 like t1;
+
+-- If there is no () part after the destination table name,
+-- all columns must be specified, either as * or by name.
+insert into t2 select * from t1;
+insert into t2 select c1, c2 from t1;
+
+-- With the () notation following the destination table name,
+-- you can omit columns (all values for that column are NULL
+-- in the destination table), and/or reorder the values
+-- selected from the source table. This is the "column permutation" feature.
+insert into t2 (c1) select c1 from t1;
+insert into t2 (c2, c1) select c1, c2 from t1;
+
+-- The column names can be entirely different in the source and destination 
tables.
+-- You can copy any columns, not just the corresponding ones, from the source 
table.
+-- But the number and type of selected columns must match the columns 
mentioned in the () part.
+alter table t2 replace columns (x int, y int);
+insert into t2 (y) select c1 from t1;
+
+-- For partitioned tables, all the partitioning columns must be mentioned in 
the () column list
+-- or a PARTITION clause; these columns cannot be defaulted to NULL.
+create table pt1 (x int, y int) partitioned by (z int);
+-- The values from c1 are copied into the column x in the new table,
+-- all in the same partition based on a constant value for z.
+-- The values of y in the new table are all NULL.
+insert into pt1 (x) partition (z=5) select c1 from t1;
+-- Again we omit the values for column y so they are all NULL.
+-- The inserted x values can go into different partitions, based on
+-- the different values inserted into the partitioning column z.
+insert into pt1 (x,z) select x, z from t2;
+</codeblock>
+
+    <p>
+      <codeph>SELECT *</codeph> for a partitioned table requires that all 
partition key columns in the source table
+      be declared as the last columns in the <codeph>CREATE TABLE</codeph> 
statement. You still include a
+      <codeph>PARTITION BY</codeph> clause listing all the partition key 
columns. These partition columns are
+      automatically mapped to the last columns from the <codeph>SELECT 
*</codeph> list.
+    </p>
+
+<codeblock>create table source (x int, y int, year int, month int, day int);
+create table destination (x int, y int) partitioned by (year int, month int, 
day int);
+...load some data into the unpartitioned source table...
+-- Insert a single partition of data.
+-- The SELECT * means you cannot specify partition (year=2014, month, day).
+insert overwrite destination partition (year, month, day) select * from source 
where year=2014;
+-- Insert the data for all year/month/day combinations.
+insert overwrite destination partition (year, month, day) select * from source;
+
+-- If one of the partition columns is omitted from the source table,
+-- then you can specify a specific value for that column in the PARTITION 
clause.
+-- Here the source table holds only data from 2014, and so does not include a 
year column.
+create table source_2014 (x int, y int, month, day);
+...load some data into the unpartitioned source_2014 table...
+insert overwrite destination partition (year=2014, month, day) select * from 
source_2014;
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+    <p>
+      <b>Concurrency considerations:</b> Each <codeph>INSERT</codeph> 
operation creates new data files with unique
+      names, so you can run multiple <codeph>INSERT INTO</codeph> statements 
simultaneously without filename
+      conflicts.
+<!--
+If data is inserted into a table by a statement issued to a different
+<cmdname>impalad</cmdname> node,
+issue a <codeph>REFRESH <varname>table_name</varname></codeph>
+statement to make the node you are connected to aware of this new data.
+-->
+      While data is being inserted into an Impala table, the data is staged 
temporarily in a subdirectory inside
+      the data directory; during this period, you cannot issue queries against 
that table in Hive. If an
+      <codeph>INSERT</codeph> operation fails, the temporary data file and the 
subdirectory could be left behind in
+      the data directory. If so, remove the relevant subdirectory and any data 
files it contains manually, by
+      issuing an <codeph>hdfs dfs -rm -r</codeph> command, specifying the full 
path of the work subdirectory, whose
+      name ends in <codeph>_dir</codeph>.
+    </p>
+  </conbody>
+
+  <concept id="values">
+
+    <title>VALUES Clause</title>
+
+    <conbody>
+
+      <p>
+        The <codeph>VALUES</codeph> clause is a general-purpose way to specify 
the columns of one or more rows,
+        typically within an <codeph><xref 
href="impala_insert.xml#insert">INSERT</xref></codeph> statement.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/insert_values_warning">
+        <p/>
+      </note>
+
+      <p>
+        The following examples illustrate:
+      </p>
+
+      <ul>
+        <li>
+          How to insert a single row using a <codeph>VALUES</codeph> clause.
+        </li>
+
+        <li>
+          How to insert multiple rows using a <codeph>VALUES</codeph> clause.
+        </li>
+
+        <li>
+          How the row or rows from a <codeph>VALUES</codeph> clause can be 
appended to a table through
+          <codeph>INSERT INTO</codeph>, or replace the contents of the table 
through <codeph>INSERT
+          OVERWRITE</codeph>.
+        </li>
+
+        <li>
+          How the entries in a <codeph>VALUES</codeph> clause can be literals, 
function results, or any other kind
+          of expression. See <xref href="impala_literals.xml#literals"/> for 
the notation to use for literal
+          values, especially <xref 
href="impala_literals.xml#string_literals"/> for quoting and escaping
+          conventions for strings. See <xref 
href="impala_operators.xml#operators"/> and
+          <xref href="impala_functions.xml#builtins"/> for other things you 
can include in expressions with the
+          <codeph>VALUES</codeph> clause.
+        </li>
+      </ul>
+
+<codeblock>[localhost:21000] &gt; describe val_example;
+Query: describe val_example
+Query finished, fetching results ...
++-------+---------+---------+
+| name  | type    | comment |
++-------+---------+---------+
+| id    | int     |         |
+| col_1 | boolean |         |
+| col_2 | double  |         |
++-------+---------+---------+
+
+[localhost:21000] &gt; insert into val_example values (1,true,100.0);
+Inserted 1 rows in 0.30s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------+
+| id | col_1 | col_2 |
++----+-------+-------+
+| 1  | true  | 100   |
++----+-------+-------+
+
+[localhost:21000] &gt; insert overwrite val_example values 
(10,false,pow(2,5)), (50,true,10/3);
+Inserted 2 rows in 0.16s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------------------+
+| id | col_1 | col_2             |
++----+-------+-------------------+
+| 10 | false | 32                |
+| 50 | true  | 3.333333333333333 |
++----+-------+-------------------+</codeblock>
+
+      <p>
+        When used in an <codeph>INSERT</codeph> statement, the Impala 
<codeph>VALUES</codeph> clause can specify
+        some or all of the columns in the destination table, and the columns 
can be specified in a different order
+        than they actually appear in the table. To specify a different set or 
order of columns than in the table,
+        use the syntax:
+      </p>
+
+<codeblock>INSERT INTO <varname>destination</varname>
+  (<varname>col_x</varname>, <varname>col_y</varname>, 
<varname>col_z</varname>)
+  VALUES
+  (<varname>val_x</varname>, <varname>val_y</varname>, 
<varname>val_z</varname>);
+</codeblock>
+
+      <p>
+        Any columns in the table that are not listed in the 
<codeph>INSERT</codeph> statement are set to
+        <codeph>NULL</codeph>.
+      </p>
+
+<!--
+      <p>
+        does not support specifying a subset of the
+        columns in the table or specifying the columns in a different order. 
Use a
+        <codeph>VALUES</codeph> clause with all the column values in the same 
order as
+        the table definition, using <codeph>NULL</codeph> values for any 
columns you
+        want to omit from the <codeph>INSERT</codeph> operation.
+      </p>
+-->
+
+      <p>
+        To use a <codeph>VALUES</codeph> clause like a table in other 
statements, wrap it in parentheses and use
+        <codeph>AS</codeph> clauses to specify aliases for the entire object 
and any columns you need to refer to:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select * from (values(4,5,6),(7,8,9)) as t;
++---+---+---+
+| 4 | 5 | 6 |
++---+---+---+
+| 4 | 5 | 6 |
+| 7 | 8 | 9 |
++---+---+---+
+[localhost:21000] &gt; select * from (values(1 as c1, true as c2, 'abc' as 
c3),(100,false,'xyz')) as t;
++-----+-------+-----+
+| c1  | c2    | c3  |
++-----+-------+-----+
+| 1   | true  | abc |
+| 100 | false | xyz |
++-----+-------+-----+</codeblock>
+
+      <p>
+        For example, you might use a tiny table constructed like this from 
constant literals or function return
+        values as part of a longer statement involving joins or <codeph>UNION 
ALL</codeph>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+      <p>
+        Impala physically writes all inserted files under the ownership of its 
default user, typically
+        <codeph>impala</codeph>. Therefore, this user must have HDFS write 
permission in the corresponding table
+        directory.
+      </p>
+
+      <p>
+        The permission requirement is independent of the authorization 
performed by the Sentry framework. (If the
+        connected user is not authorized to insert into a table, Sentry blocks 
that operation immediately,
+        regardless of the privileges available to the <codeph>impala</codeph> 
user.) Files created by Impala are
+        not owned by and do not inherit permissions from the connected user.
+      </p>
+
+      <p>
+        The number of data files produced by an <codeph>INSERT</codeph> 
statement depends on the size of the
+        cluster, the number of data blocks that are processed, the partition 
key columns in a partitioned table,
+        and the mechanism Impala uses for dividing the work in parallel. Do 
not assume that an
+        <codeph>INSERT</codeph> statement will produce some particular number 
of output files. In case of
+        performance issues with data written by Impala, check that the output 
files do not suffer from issues such
+        as many tiny files or many tiny partitions. (In the Hadoop context, 
even files or partitions of a few tens
+        of megabytes are considered <q>tiny</q>.)
+      </p>
+
+      <p 
conref="../shared/impala_common.xml#common/insert_hidden_work_directory"/>
+
+      <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+
+      <p>
+        You can use the <codeph>INSERT</codeph> statement with HBase tables as 
follows:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            You can insert a single row or a small set of rows into an HBase 
table with the <codeph>INSERT ...
+            VALUES</codeph> syntax. This is a good use case for HBase tables 
with Impala, because HBase tables are
+            not subject to the same kind of fragmentation from many small 
insert operations as HDFS tables are.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can insert any number of rows at once into an HBase table 
using the <codeph>INSERT ...
+            SELECT</codeph> syntax.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If more than one inserted row has the same value for the HBase key 
column, only the last inserted row
+            with that value is visible to Impala queries. You can take 
advantage of this fact with <codeph>INSERT
+            ... VALUES</codeph> statements to effectively update rows one at a 
time, by inserting new rows with the
+            same key values as existing rows. Be aware that after an 
<codeph>INSERT ... SELECT</codeph> operation
+            copying from an HDFS table, the HBase table might contain fewer 
rows than were inserted, if the key
+            column in the source table contained duplicate values.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You cannot <codeph>INSERT OVERWRITE</codeph> into an HBase table. 
New rows are always appended.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you create an Impala or Hive table that maps to an HBase 
table, the column order you specify with
+            the <codeph>INSERT</codeph> statement might be different than the 
order you declare with the
+            <codeph>CREATE TABLE</codeph> statement. Behind the scenes, HBase 
arranges the columns based on how
+            they are divided into column families. This might cause a mismatch 
during insert operations, especially
+            if you use the syntax <codeph>INSERT INTO 
<varname>hbase_table</varname> SELECT * FROM
+            <varname>hdfs_table</varname></codeph>. Before inserting data, 
verify the column order by issuing a
+            <codeph>DESCRIBE</codeph> statement for the table, and adjust the 
order of the select list in the
+            <codeph>INSERT</codeph> statement.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        See <xref href="impala_hbase.xml#impala_hbase"/> for more details 
about using Impala with HBase.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+      <p conref="../shared/impala_common.xml#common/s3_dml"/>
+      <p conref="../shared/impala_common.xml#common/s3_dml_performance"/>
+      <p>See <xref href="../topics/impala_s3.xml#s3"/> for details about 
reading and writing S3 data with Impala.</p>
+
+      <p conref="../shared/impala_common.xml#common/security_blurb"/>
+      <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+      <p conref="../shared/impala_common.xml#common/cancel_blurb_yes"/>
+
+      <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+      <p rev="CDH-19187">
+        The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+        typically the <codeph>impala</codeph> user, must have read
+        permission for the files in the source directory of an <codeph>INSERT 
... SELECT</codeph>
+        operation, and write permission for all affected directories in the 
destination table.
+        (An <codeph>INSERT</codeph> operation could write files to multiple 
different HDFS directories
+        if the destination table is partitioned.)
+        This user must also have write permission to create a temporary work 
directory
+        in the top-level HDFS directory of the destination table.
+        An <codeph>INSERT OVERWRITE</codeph> operation does not require write 
permission on
+        the original data files in the table, only on the table directories 
themselves.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/char_varchar_cast_from_string"/>
+
+      <p conref="../shared/impala_common.xml#common/related_options"/>
+
+      <p rev="1.3.1" 
conref="../shared/impala_common.xml#common/insert_inherit_permissions"/>
+    </conbody>
+  </concept>
+
+<!-- Values clause -->
+</concept>
+<!-- INSERT statement -->


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_install.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_install.xml b/docs/topics/impala_install.xml
new file mode 100644
index 0000000..ab68314
--- /dev/null
+++ b/docs/topics/impala_install.xml
@@ -0,0 +1,106 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="install">
+
+  <title><ph audience="standalone">Installing Impala</ph><ph 
audience="integrated">Impala Installation</ph></title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Installing"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">installation</indexterm>
+      <indexterm audience="Cloudera">pseudo-distributed cluster</indexterm>
+      <indexterm audience="Cloudera">cluster</indexterm>
+      <indexterm audience="Cloudera">DataNodes</indexterm>
+      <indexterm audience="Cloudera">NameNode</indexterm>
+      <indexterm audience="Cloudera">Cloudera Manager</indexterm>
+      <indexterm audience="Cloudera">impalad</indexterm>
+      <indexterm audience="Cloudera">impala-shell</indexterm>
+      <indexterm audience="Cloudera">statestored</indexterm>
+      Impala is an open-source add-on to the Cloudera Enterprise Core that 
returns rapid responses to
+      queries.
+    </p>
+
+    <note>
+      <p>
+        Under CDH 5, Impala is included as part of the CDH installation and no 
separate steps are needed.
+        <ph audience="standalone">Therefore, the instruction steps in this 
section apply to CDH 4 only.</ph>
+      </p>
+    </note>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="install_details">
+
+    <title>What is Included in an Impala Installation</title>
+
+    <conbody>
+
+      <p>
+        Impala is made up of a set of components that can be installed on 
multiple nodes throughout your cluster.
+        The key installation step for performance is to install the 
<cmdname>impalad</cmdname> daemon (which does
+        most of the query processing work) on <i>all</i> DataNodes in the 
cluster.
+      </p>
+
+      <p>
+        The Impala package installs these binaries:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            <cmdname>impalad</cmdname> - The Impala daemon. Plans and executes 
queries against HDFS, HBase, <ph rev="2.2.0">and Amazon S3 data</ph>.
+            <xref href="impala_processes.xml#processes">Run one impalad 
process</xref> on each node in the cluster
+            that has a DataNode.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            <cmdname>statestored</cmdname> - Name service that tracks location 
and status of all
+            <codeph>impalad</codeph> instances in the cluster. <xref 
href="impala_processes.xml#processes">Run one
+            instance of this daemon</xref> on a node in your cluster. Most 
production deployments run this daemon
+            on the namenode.
+          </p>
+        </li>
+
+        <li rev="1.2">
+          <p>
+            <cmdname>catalogd</cmdname> - Metadata coordination service that 
broadcasts changes from Impala DDL and
+            DML statements to all affected Impala nodes, so that new tables, 
newly loaded data, and so on are
+            immediately visible to queries submitted through any Impala node.
+<!-- Consider removing this when 1.2 gets far in the past. -->
+            (Prior to Impala 1.2, you had to run the <codeph>REFRESH</codeph> 
or <codeph>INVALIDATE
+            METADATA</codeph> statement on each node to synchronize changed 
metadata. Now those statements are only
+            required if you perform the DDL or DML through an external 
mechanism such as Hive <ph rev="2.2.0">or by uploading
+            data to the Amazon S3 filesystem</ph>.)
+            <xref href="impala_processes.xml#processes">Run one instance of 
this daemon</xref> on a node in your cluster,
+            preferably on the same host as the <codeph>statestored</codeph> 
daemon.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            <cmdname>impala-shell</cmdname> - <xref 
href="impala_impala_shell.xml#impala_shell">Command-line
+            interface</xref> for issuing queries to the Impala daemon. You 
install this on one or more hosts
+            anywhere on your network, not necessarily DataNodes or even within 
the same cluster as Impala. It can
+            connect remotely to any instance of the Impala daemon.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        Before doing the installation, ensure that you have all necessary 
prerequisites. See
+        <xref href="impala_prereqs.xml#prereqs"/> for details.
+      </p>
+    </conbody>
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_int.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_int.xml b/docs/topics/impala_int.xml
new file mode 100644
index 0000000..aeead5b
--- /dev/null
+++ b/docs/topics/impala_int.xml
@@ -0,0 +1,95 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="int">
+
+  <title>INT Data Type</title>
+  <titlealts audience="PDF"><navtitle>INT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      A 4-byte integer data type used in <codeph>CREATE TABLE</codeph> and 
<codeph>ALTER TABLE</codeph> statements.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+    </p>
+
+<codeblock><varname>column_name</varname> INT</codeblock>
+
+    <p>
+      <b>Range:</b> -2147483648 .. 2147483647. There is no 
<codeph>UNSIGNED</codeph> subtype.
+    </p>
+
+    <p>
+      <b>Conversions:</b> Impala automatically converts to a larger integer 
type (<codeph>BIGINT</codeph>) or a
+      floating-point type (<codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>) 
automatically. Use
+      <codeph>CAST()</codeph> to convert to <codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>,
+      <codeph>STRING</codeph>, or <codeph>TIMESTAMP</codeph>.
+      <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      The data type <codeph>INTEGER</codeph> is an alias for 
<codeph>INT</codeph>.
+    </p>
+
+    <p>
+      For a convenient and automated way to check the bounds of the 
<codeph>INT</codeph> type, call the functions
+      <codeph>MIN_INT()</codeph> and <codeph>MAX_INT()</codeph>.
+    </p>
+
+    <p>
+      If an integer value is too large to be represented as a 
<codeph>INT</codeph>, use a <codeph>BIGINT</codeph>
+      instead.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>CREATE TABLE t1 (x INT);
+SELECT CAST(1000 AS INT);
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/partitioning_good"/>
+
+    <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/parquet_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!-- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/internals_4_bytes"/>
+
+    <p conref="../shared/impala_common.xml#common/added_forever"/>
+
+    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+<!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_literals.xml#numeric_literals"/>, <xref 
href="impala_tinyint.xml#tinyint"/>,
+      <xref href="impala_smallint.xml#smallint"/>, <xref 
href="impala_int.xml#int"/>,
+      <xref href="impala_bigint.xml#bigint"/>, <xref 
href="impala_decimal.xml#decimal"/>,
+      <xref href="impala_math_functions.xml#math_functions"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_intro.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_intro.xml b/docs/topics/impala_intro.xml
new file mode 100644
index 0000000..c599bc5
--- /dev/null
+++ b/docs/topics/impala_intro.xml
@@ -0,0 +1,81 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="intro">
+
+  <title id="impala"><ph audience="standalone">Introducing Apache Impala 
(incubating)</ph><ph audience="integrated">Apache Impala (incubating) 
Overview</ph></title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Getting Started"/>
+      <data name="Category" value="Concepts"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody id="intro_body">
+
+      <p>
+        Impala provides fast, interactive SQL queries directly on your Apache 
Hadoop data stored in HDFS,
+        HBase, <ph rev="2.2.0">or the Amazon Simple Storage Service (S3)</ph>.
+        In addition to using the same unified storage platform,
+        Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC 
driver, and user interface
+        (Impala query UI in Hue) as Apache Hive. This
+        provides a familiar and unified platform for real-time or 
batch-oriented queries.
+      </p>
+
+      <p>
+        Impala is an addition to tools available for querying big data. Impala 
does not replace the batch
+        processing frameworks built on MapReduce such as Hive. Hive and other 
frameworks built on MapReduce are
+        best suited for long running batch jobs, such as those involving batch 
processing of Extract, Transform,
+        and Load (ETL) type jobs.
+      </p>
+
+      <note>
+        Impala was accepted into the Apache incubator on December 2, 2015.
+        In places where the documentation formerly referred to <q>Cloudera 
Impala</q>,
+        now the official name is <q>Apache Impala (incubating)</q>.
+      </note>
+
+  </conbody>
+
+  <concept id="benefits">
+
+    <title>Impala Benefits</title>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/impala_benefits"/>
+
+    </conbody>
+  </concept>
+
+  <concept id="impala_cdh">
+
+    <title>How Impala Works with CDH</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Concepts"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/impala_overview_diagram"/>
+
+      <p conref="../shared/impala_common.xml#common/component_list"/>
+
+      <p conref="../shared/impala_common.xml#common/query_overview"/>
+    </conbody>
+  </concept>
+
+  <concept id="features">
+
+    <title>Primary Impala Features</title>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/feature_list"/>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_invalidate_metadata.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_invalidate_metadata.xml 
b/docs/topics/impala_invalidate_metadata.xml
new file mode 100644
index 0000000..c41a996
--- /dev/null
+++ b/docs/topics/impala_invalidate_metadata.xml
@@ -0,0 +1,233 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="invalidate_metadata">
+
+  <title>INVALIDATE METADATA Statement</title>
+  <titlealts audience="PDF"><navtitle>INVALIDATE 
METADATA</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="Metastore"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">INVALIDATE METADATA statement</indexterm>
+      Marks the metadata for one or all tables as stale. Required after a 
table is created through the Hive shell,
+      before the table is available for Impala queries. The next time the 
current Impala node performs a query
+      against a table whose metadata is invalidated, Impala reloads the 
associated metadata before the query
+      proceeds. This is a relatively expensive operation compared to the 
incremental metadata update done by the
+      <codeph>REFRESH</codeph> statement, so in the common scenario of adding 
new data files to an existing table,
+      prefer <codeph>REFRESH</codeph> rather than <codeph>INVALIDATE 
METADATA</codeph>. If you are not familiar
+      with the way Impala uses metadata and how it shares the same metastore 
database as Hive, see
+      <xref href="impala_hadoop.xml#intro_metastore"/> for background 
information.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>INVALIDATE METADATA 
[[<varname>db_name</varname>.]<varname>table_name</varname>]</codeblock>
+
+    <p>
+      By default, the cached metadata for all tables is flushed. If you 
specify a table name, only the metadata for
+      that one table is flushed. Even for a single table, <codeph>INVALIDATE 
METADATA</codeph> is more expensive
+      than <codeph>REFRESH</codeph>, so prefer <codeph>REFRESH</codeph> in the 
common case where you add new data
+      files for an existing table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/internals_blurb"/>
+
+    <p>
+      To accurately respond to queries, Impala must have current metadata 
about those databases and tables that
+      clients query directly. Therefore, if some other entity modifies 
information used by Impala in the metastore
+      that Impala and Hive share, the information cached by Impala must be 
updated. However, this does not mean
+      that all metadata updates require an Impala update.
+    </p>
+
+    <note>
+      <p conref="../shared/impala_common.xml#common/catalog_server_124"/>
+      <p rev="1.2">
+        In Impala 1.2 and higher, a dedicated daemon 
(<cmdname>catalogd</cmdname>) broadcasts DDL changes made
+        through Impala to all Impala nodes. Formerly, after you created a 
database or table while connected to one
+        Impala node, you needed to issue an <codeph>INVALIDATE 
METADATA</codeph> statement on another Impala node
+        before accessing the new database or table from the other node. Now, 
newly created or altered objects are
+        picked up automatically by all Impala nodes. You must still use the 
<codeph>INVALIDATE METADATA</codeph>
+        technique after creating or altering objects through Hive. See
+        <xref href="impala_components.xml#intro_catalogd"/> for more 
information on the catalog service.
+      </p>
+      <p>
+        The <codeph>INVALIDATE METADATA</codeph> statement is new in Impala 
1.1 and higher, and takes over some of
+        the use cases of the Impala 1.0 <codeph>REFRESH</codeph> statement. 
Because <codeph>REFRESH</codeph> now
+        requires a table name parameter, to flush the metadata for all tables 
at once, use the <codeph>INVALIDATE
+        METADATA</codeph> statement.
+      </p>
+      <p conref="../shared/impala_common.xml#common/invalidate_then_refresh"/>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      A metadata update for an <codeph>impalad</codeph> instance <b>is</b> 
required if:
+    </p>
+
+    <ul>
+      <li>
+        A metadata change occurs.
+      </li>
+
+      <li>
+        <b>and</b> the change is made from another <codeph>impalad</codeph> 
instance in your cluster, or through
+        Hive.
+      </li>
+
+      <li>
+        <b>and</b> the change is made to a metastore database to which clients 
such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p>
+      A metadata update for an Impala node is <b>not</b> required when you 
issue queries from the same Impala node
+      where you ran <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, or 
other table-modifying statement.
+    </p>
+
+    <p>
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul>
+      <li>
+        Hive - via <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, 
<codeph>DROP</codeph> or
+        <codeph>INSERT</codeph> operations.
+      </li>
+
+      <li>
+        Impalad - via <codeph>CREATE TABLE</codeph>, <codeph>ALTER 
TABLE</codeph>, and <codeph>INSERT</codeph>
+        operations.
+      </li>
+    </ul>
+
+    <p>
+      <codeph>INVALIDATE METADATA</codeph> causes the metadata for that table 
to be marked as stale, and reloaded
+      the next time the table is referenced. For a huge table, that process 
could take a noticeable amount of time;
+      thus you might prefer to use <codeph>REFRESH</codeph> where practical, 
to avoid an unpredictable delay later,
+      for example if the next reference to the table is during a benchmark 
test.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how you might use the <codeph>INVALIDATE 
METADATA</codeph> statement after
+      creating new tables (such as SequenceFile or HBase tables) through the 
Hive shell. Before the
+      <codeph>INVALIDATE METADATA</codeph> statement was issued, Impala would 
give a <q>table not found</q> error
+      if you tried to refer to those table names. The 
<codeph>DESCRIBE</codeph> statements cause the latest
+      metadata to be immediately loaded for the tables, avoiding a delay the 
next time those tables are queried.
+    </p>
+
+<codeblock>[impalad-host:21000] &gt; invalidate metadata;
+[impalad-host:21000] &gt; describe t1;
+...
+[impalad-host:21000] &gt; describe t2;
+... </codeblock>
+
+    <p>
+      For more examples of using <codeph>REFRESH</codeph> and 
<codeph>INVALIDATE METADATA</codeph> with a
+      combination of Impala and Hive operations, see <xref 
href="impala_tutorial.xml#tutorial_impala_hive"/>.
+    </p>
+
+    <p>
+      If you need to ensure that the metadata is up-to-date when you start an 
<cmdname>impala-shell</cmdname>
+      session, run <cmdname>impala-shell</cmdname> with the 
<codeph>-r</codeph> or
+      <codeph>--refresh_after_connect</codeph> command-line option. Because 
this operation adds a delay to the next
+      query against each table, potentially expensive for large tables with 
many partitions, try to avoid using
+      this option for day-to-day operations in a production environment.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <codeph>LOCATION</codeph> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this 
statement,
+      but subsequent statements such as <codeph>SELECT</codeph>
+      or <codeph>SHOW TABLE STATS</codeph> could fail.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      By default, the <codeph>INVALIDATE METADATA</codeph> command checks HDFS 
permissions of the underlying data
+      files and directories, caching this information so that a statement can 
be cancelled immediately if for
+      example the <codeph>impala</codeph> user does not have permission to 
write to the data directory for the
+      table. (This checking does not apply if you have set the 
<cmdname>catalogd</cmdname> configuration option
+      <codeph>--load_catalog_in_background=false</codeph>.) Impala reports any 
lack of write permissions as an
+      <codeph>INFO</codeph> message in the log file, in case that represents 
an oversight. If you change HDFS
+      permissions to make data readable or writeable by the Impala user, issue 
another <codeph>INVALIDATE
+      METADATA</codeph> to make Impala aware of the change.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p rev="1.2.4">
+      This example illustrates creating a new database and new table in Hive, 
then doing an <codeph>INVALIDATE
+      METADATA</codeph> statement in Impala using the fully qualified table 
name, after which both the new table
+      and the new database are visible to Impala. The ability to specify 
<codeph>INVALIDATE METADATA
+      <varname>table_name</varname></codeph> for a table created in Hive is a 
new capability in Impala 1.2.4. In
+      earlier releases, that statement would have returned an error indicating 
an unknown table, requiring you to
+      do <codeph>INVALIDATE METADATA</codeph> with no table name, a more 
expensive operation that reloaded metadata
+      for all tables and databases.
+    </p>
+
+<codeblock rev="1.2.4">$ hive
+hive&gt; create database new_db_from_hive;
+OK
+Time taken: 4.118 seconds
+hive&gt; create table new_db_from_hive.new_table_from_hive (x int);
+OK
+Time taken: 0.618 seconds
+hive&gt; quit;
+$ impala-shell
+[localhost:21000] &gt; show databases like 'new*';
+[localhost:21000] &gt; refresh new_db_from_hive.new_table_from_hive;
+ERROR: AnalysisException: Database does not exist: new_db_from_hive
+[localhost:21000] &gt; invalidate metadata 
new_db_from_hive.new_table_from_hive;
+[localhost:21000] &gt; show databases like 'new*';
++--------------------+
+| name               |
++--------------------+
+| new_db_from_hive   |
++--------------------+
+[localhost:21000] &gt; show tables in new_db_from_hive;
++---------------------+
+| name                |
++---------------------+
+| new_table_from_hive |
++---------------------+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p conref="../shared/impala_common.xml#common/s3_metadata"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hadoop.xml#intro_metastore"/>,
+      <xref href="impala_refresh.xml#refresh"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_isilon.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_isilon.xml b/docs/topics/impala_isilon.xml
new file mode 100644
index 0000000..f631268
--- /dev/null
+++ b/docs/topics/impala_isilon.xml
@@ -0,0 +1,114 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept rev="5.4.3" id="impala_isilon">
+
+  <title>Using Impala with Isilon Storage</title>
+  <titlealts audience="PDF"><navtitle>Isilon Storage</navtitle></titlealts>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="CDH"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Isilon"/>
+      <data name="Category" value="Disk Storage"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">Isilon</indexterm>
+      You can use Impala to query data files that reside on EMC Isilon storage 
devices, rather than in HDFS.
+      This capability allows convenient query access to a storage system where 
you might already be
+      managing large volumes of data. The combination of the Impala query 
engine and Isilon storage is
+      certified on CDH 5.4.4 or higher.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/isilon_block_size_caveat"/>
+
+    <p>
+      The typical use case for Impala and Isilon together is to use Isilon for 
the
+      default filesystem, replacing HDFS entirely. In this configuration,
+      when you create a database, table, or partition, the data always resides 
on
+      Isilon storage and you do not need to specify any special 
<codeph>LOCATION</codeph>
+      attribute. If you do specify a <codeph>LOCATION</codeph> attribute, its 
value refers
+      to a path within the Isilon filesystem.
+      For example:
+    </p>
+<codeblock>-- If the default filesystem is Isilon, all Impala data resides 
there
+-- and all Impala databases and tables are located there.
+CREATE TABLE t1 (x INT, s STRING);
+
+-- You can specify LOCATION for database, table, or partition,
+-- using values from the Isilon filesystem.
+CREATE DATABASE d1 LOCATION '/some/path/on/isilon/server/d1.db';
+CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN);
+</codeblock>
+
+    <p>
+      Impala can write to, delete, and rename data files and database, table,
+      and partition directories on Isilon storage. Therefore, Impala 
statements such
+      as
+      <codeph>CREATE TABLE</codeph>, <codeph>DROP TABLE</codeph>,
+      <codeph>CREATE DATABASE</codeph>, <codeph>DROP DATABASE</codeph>,
+      <codeph>ALTER TABLE</codeph>,
+      and
+      <codeph>INSERT</codeph> work the same with Isilon storage as with HDFS.
+    </p>
+
+    <p>
+      When the Impala spill-to-disk feature is activated by a query that 
approaches
+      the memory limit, Impala writes all the temporary data to a local (not 
Isilon)
+      storage device. Because the I/O bandwidth for the temporary data depends 
on
+      the number of local disks, and clusters using Isilon storage might not 
have
+      as many local disks attached, pay special attention on Isilon-enabled 
clusters
+      to any queries that use the spill-to-disk feature. Where practical, tune 
the
+      queries or allocate extra memory for Impala to avoid spilling.
+      Although you can specify an Isilon storage device as the destination for
+      the temporary data for the spill-to-disk feature, that configuration is
+      not recommended due to the need to transfer the data both ways using 
remote I/O.
+    </p>
+
+    <p>
+      When tuning Impala queries on HDFS, you typically try to avoid any 
remote reads.
+      When the data resides on Isilon storage, all the I/O consists of remote 
reads.
+      Do not be alarmed when you see non-zero numbers for remote read 
measurements
+      in query profile output. The benefit of the Impala and Isilon 
integration is
+      primarily convenience of not having to move or copy large volumes of 
data to HDFS,
+      rather than raw query performance. You can increase the performance of 
Impala
+      I/O for Isilon systems by increasing the value for the
+      <codeph>num_remote_hdfs_io_threads</codeph> configuration parameter,
+      in the Cloudera Manager user interface for clusters using Cloudera 
Manager,
+      or through the <codeph>--num_remote_hdfs_io_threads</codeph> startup 
option
+      for the <cmdname>impalad</cmdname> daemon on clusters not using Cloudera 
Manager.
+    </p>
+
+    <p>
+<!--
+      For information about tasks performed on
+      Isilon OneFS, see the information hub for Cloudera on the EMC Community 
Network:
+      <xref href="https://community.emc.com/docs/DOC-39522"; format="html" 
scope="external">https://community.emc.com/docs/DOC-39522</xref>.
+-->
+      <!-- This is a little bit of a circular loop when this topic is conrefed 
into the main Isilon page,
+           consider if there's a way to conditionalize it out in that case. -->
+      For information about managing Isilon storage devices through Cloudera 
Manager, see
+      <xref audience="integrated" href="cm_mc_isilon_service.xml"/><xref 
audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_isilon_service.html";
 scope="external" format="html"/>.
+    </p>
+
+    <!-- <p outputclass="toc inpage"/> -->
+  </conbody>
+<concept id="isilon_cm_configs">
+<title>Required Configurations</title>
+<conbody>
+<p>Specify the following configurations in Cloudera Manager on the 
<menucascade><uicontrol>Clusters</uicontrol><uicontrol><varname>Isilon 
Service</varname></uicontrol><uicontrol>Configuration</uicontrol></menucascade> 
tab:<ul id="ul_vpx_bw5_vv">
+<li>In <uicontrol>HDFS Client Advanced Configuration Snippet (Safety Valve) 
for hdfs-site.xml</uicontrol> <codeph>hdfs-site.xml</codeph> and the 
<uicontrol>Cluster-wide Advanced Configuration Snippet (Safety Valve) for 
core-site.xml</uicontrol> properties for the Isilon service, set the value of 
the <codeph>dfs.client.file-block-storage-locations.timeout.millis</codeph> 
property to <codeph>10000</codeph>.</li>
+<li>In the Isilon <uicontrol>Cluster-wide Advanced Configuration Snippet 
(Safety Valve) for core-site.xml</uicontrol> property for the Isilon service, 
set the value of the <codeph>hadoop.security.token.service.use_ip</codeph> 
property to <codeph>FALSE</codeph>. </li>
+<li>If you see errors that reference the <codeph>.Trash</codeph> directory, 
make sure that the <uicontrol>Use Trash</uicontrol> property is selected.</li>
+</ul></p>
+
+</conbody>
+</concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_jdbc.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_jdbc.xml b/docs/topics/impala_jdbc.xml
new file mode 100644
index 0000000..d667170
--- /dev/null
+++ b/docs/topics/impala_jdbc.xml
@@ -0,0 +1,387 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="impala_jdbc">
+
+  <title id="jdbc">Configuring Impala to Work with JDBC</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="JDBC"/>
+      <data name="Category" value="Java"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Starting and Stopping"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">JDBC</indexterm>
+      Impala supports the standard JDBC interface, allowing access from 
commercial Business Intelligence tools and
+      custom software written in Java or other programming languages. The JDBC 
driver allows you to access Impala
+      from a Java program that you write, or a Business Intelligence or 
similar tool that uses JDBC to communicate
+      with various database products.
+    </p>
+
+    <p>
+      Setting up a JDBC connection to Impala involves the following steps:
+    </p>
+
+    <ul>
+      <li>
+        Verifying the communication port where the Impala daemons in your 
cluster are listening for incoming JDBC
+        requests.
+      </li>
+
+      <li>
+        Installing the JDBC driver on every system that runs the JDBC-enabled 
application.
+      </li>
+
+      <li>
+        Specifying a connection string for the JDBC application to access one 
of the servers running the
+        <cmdname>impalad</cmdname> daemon, with the appropriate security 
settings.
+      </li>
+    </ul>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="jdbc_port">
+
+    <title>Configuring the JDBC Port</title>
+
+    <conbody>
+
+      <p>
+        The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 
21050. Impala server accepts JDBC
+        connections through this same port 21050 by default. Make sure this 
port is available for communication
+        with other hosts on your network, for example, that it is not blocked 
by firewall software. If your JDBC
+        client software connects to a different port, specify that alternative 
port number with the
+        <codeph>--hs2_port</codeph> option when starting 
<codeph>impalad</codeph>. See
+        <xref href="impala_processes.xml#processes"/> for details about Impala 
startup options. See
+        <xref href="impala_ports.xml#ports"/> for information about all ports 
used for communication between Impala
+        and clients or between Impala components.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="jdbc_driver_choice">
+
+    <title>Choosing the JDBC Driver</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Planning"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        In Impala 2.0 and later, you have the choice between the Cloudera JDBC 
Connector and the Hive 0.13 JDBC driver.
+        <ph rev="upstream">Cloudera</ph> recommends using the Cloudera JDBC 
Connector where practical.
+      </p>
+
+      <p>
+        If you are already using JDBC applications with an earlier Impala 
release, you must update your JDBC driver
+        to one of these choices, because the Hive 0.12 driver that was 
formerly the only choice is not compatible
+        with Impala 2.0 and later.
+      </p>
+
+      <p>
+        Both the Cloudera JDBC 2.5 Connector and the Hive JDBC driver provide 
a substantial speed increase for JDBC
+        applications with Impala 2.0 and higher, for queries that return large 
result sets.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/>
+      <p 
conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/>
+
+    </conbody>
+  </concept>
+
+  <concept id="jdbc_setup">
+
+    <title>Enabling Impala JDBC Support on Client Systems</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Installing"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <section id="install_jdbc_connector">
+        <title>Using the Cloudera JDBC Connector (recommended)</title>
+
+        <p>
+          You download and install the Cloudera JDBC 2.5 connector on any 
Linux, Windows, or Mac system where you
+          intend to run JDBC-enabled applications. From the
+          <xref href="http://go.cloudera.com/odbc-driver-hive-impala.html"; 
scope="external" format="html">Cloudera
+          Connectors download page</xref>, you choose the appropriate protocol 
(JDBC or ODBC) and target product
+          (Impala or Hive). The ease of downloading and installing on non-CDH 
systems makes this connector a
+          convenient choice for organizations with heterogeneous environments.
+        </p>
+
+      </section>
+
+      <section id="install_hive_driver">
+        <title>Using the Hive JDBC Driver</title>
+        <p>
+          You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> 
package) through the Linux package manager, on
+          hosts within the CDH cluster. The driver consists of several Java 
JAR files. The same driver can be used by Impala and Hive.
+        </p>
+
+        <p>
+          To get the JAR files, install the Hive JDBC driver on each 
CDH-enabled host in the cluster that will run
+          JDBC applications. Follow the instructions for
+          <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_jdbc_install.html";
 scope="external" format="html">CDH 5</xref>.
+        </p>
+
+        <note>
+          The latest JDBC driver, corresponding to Hive 0.13, provides 
substantial performance improvements for
+          Impala queries that return large result sets. Impala 2.0 and later 
are compatible with the Hive 0.13
+          driver. If you already have an older JDBC driver installed, and are 
running Impala 2.0 or higher, consider
+          upgrading to the latest Hive JDBC driver for best performance with 
JDBC applications.
+        </note>
+
+        <p>
+          If you are using JDBC-enabled applications on hosts outside the CDH 
cluster, you cannot use the CDH install
+          procedure on the non-CDH hosts. Install the JDBC driver on at least 
one CDH host using the preceding
+          procedure. Then download the JAR files to each client machine that 
will use JDBC with Impala:
+        </p>
+
+  <codeblock>commons-logging-X.X.X.jar
+  hadoop-common.jar
+  hive-common-X.XX.X-cdhX.X.X.jar
+  hive-jdbc-X.XX.X-cdhX.X.X.jar
+  hive-metastore-X.XX.X-cdhX.X.X.jar
+  hive-service-X.XX.X-cdhX.X.X.jar
+  httpclient-X.X.X.jar
+  httpcore-X.X.X.jar
+  libfb303-X.X.X.jar
+  libthrift-X.X.X.jar
+  log4j-X.X.XX.jar
+  slf4j-api-X.X.X.jar
+  slf4j-logXjXX-X.X.X.jar
+  </codeblock>
+
+        <p>
+          <b>To enable JDBC support for Impala on the system where you run the 
JDBC application:</b>
+        </p>
+
+        <ol>
+          <li>
+            Download the JAR files listed above to each client machine.
+  <!--
+            Download the
+            <xref 
href="https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip"; 
scope="external" format="zip">Impala
+            JDBC zip file</xref> to the client machine that you will use to 
connect to Impala servers.
+  -->
+            <note>
+              For Maven users, see
+              <xref 
href="https://github.com/onefoursix/Cloudera-Impala-JDBC-Example"; 
scope="external" format="html">this
+              sample github page</xref> for an example of the dependencies you 
could add to a <codeph>pom</codeph>
+              file instead of downloading the individual JARs.
+            </note>
+          </li>
+
+          <li>
+            Store the JAR files in a location of your choosing, ideally a 
directory already referenced in your
+            <codeph>CLASSPATH</codeph> setting. For example:
+            <ul>
+              <li>
+                On Linux, you might use a location such as
+                <codeph>/</codeph><codeph>opt</codeph><codeph>/jars/</codeph>.
+              </li>
+
+              <li>
+                On Windows, you might use a subdirectory underneath 
<filepath>C:\Program Files</filepath>.
+              </li>
+            </ul>
+          </li>
+
+          <li>
+            To successfully load the Impala JDBC driver, client programs must 
be able to locate the associated JAR
+            files. This often means setting the <codeph>CLASSPATH</codeph> for 
the client process to include the
+            JARs. Consult the documentation for your JDBC client for more 
details on how to install new JDBC drivers,
+            but some examples of how to set <codeph>CLASSPATH</codeph> 
variables include:
+            <ul>
+              <li>
+                On Linux, if you extracted the JARs to 
<codeph>/opt/jars/</codeph>, you might issue the following
+                command to prepend the JAR files path to an existing classpath:
+  <codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
+              </li>
+
+              <li>
+                On Windows, use the <b>System Properties</b> control panel 
item to modify the <b>Environment
+                Variables</b> for your system. Modify the environment 
variables to include the path to which you
+                extracted the files.
+                <note>
+                  If the existing <codeph>CLASSPATH</codeph> on your client 
machine refers to some older version of
+                  the Hive JARs, ensure that the new JARs are the first ones 
listed. Either put the new JAR files
+                  earlier in the listings, or delete the other references to 
Hive JAR files.
+                </note>
+              </li>
+            </ul>
+          </li>
+        </ol>
+      </section>
+
+    </conbody>
+  </concept>
+
+  <concept id="jdbc_connect">
+
+    <title>Establishing JDBC Connections</title>
+
+    <conbody>
+
+      <p>
+        The JDBC driver class depends on which driver you select.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
+
+      <section id="class_jdbc_connector">
+
+        <title>Using the Cloudera JDBC Connector (recommended)</title>
+
+        <p>
+          Depending on the level of the JDBC API your application is 
targeting, you can use
+          the following fully-qualified class names (FQCNs):
+        </p>
+
+        <ul>
+          <li><codeph>com.cloudera.impala.jdbc41.Driver</codeph></li>
+          <li><codeph>com.cloudera.impala.jdbc41.DataSource</codeph></li>
+        </ul>
+
+        <ul>
+          <li><codeph>com.cloudera.impala.jdbc4.Driver</codeph></li>
+          <li><codeph>com.cloudera.impala.jdbc4.DataSource</codeph></li>
+        </ul>
+
+        <ul>
+          <li><codeph>com.cloudera.impala.jdbc3.Driver</codeph></li>
+          <li><codeph>com.cloudera.impala.jdbc3.DataSource</codeph></li>
+        </ul>
+
+        <p>
+          The connection string has the following format:
+        </p>
+
+<codeblock>jdbc:impala://<varname>Host</varname>:<varname>Port</varname>[/<varname>Schema</varname>];<varname>Property1</varname>=<varname>Value</varname>;<varname>Property2</varname>=<varname>Value</varname>;...</codeblock>
+
+        <p>
+          The <codeph>port</codeph> value is typically 21050 for Impala.
+        </p>
+
+        <p>
+          For full details about the classes and the connection string 
(especially the property values available
+          for the connection string), download the appropriate driver 
documentation for your platform from
+          <xref 
href="http://www.cloudera.com/content/cloudera/en/downloads/connectors/impala/jdbc/impala-jdbc-v2-5-5.html";
 scope="external" format="html">the Impala JDBC Connector download page</xref>.
+        </p>
+
+      </section>
+
+      <section id="class_hive_driver">
+      <title>Using the Hive JDBC Driver</title>
+
+      <p>
+        For example, with the Hive JDBC driver, the class name is 
<codeph>org.apache.hive.jdbc.HiveDriver</codeph>.
+        Once you have configured Impala to work with JDBC, you can establish 
connections between the two.
+        To do so for a cluster that does not use
+        Kerberos authentication, use a connection string of the form
+        
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
+<!--
+        Include the <codeph>auth=noSasl</codeph> argument
+        only when connecting to a non-Kerberos cluster; if Kerberos is 
enabled, omit the <codeph>auth</codeph> argument.
+-->
+        For example, you might use:
+      </p>
+
+<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
+
+      <p>
+        To connect to an instance of Impala that requires Kerberos 
authentication, use a connection string of the
+        form
+        
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
+        The principal must be the same user principal you used when starting 
Impala. For example, you might use:
+      </p>
+
+<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/[email protected]</codeblock>
+
+      <p>
+        To connect to an instance of Impala that requires LDAP authentication, 
use a connection string of the form
+        
<codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
+        For example, you might use:
+      </p>
+
+<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
+
+      <note>
+          <p 
conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/>
+      </note>
+
+      </section>
+
+    </conbody>
+  </concept>
+
+  <concept rev="2.3.0" id="jdbc_odbc_notes">
+    <title>Notes about JDBC and ODBC Interaction with Impala SQL 
Features</title>
+    <conbody>
+      <p>
+        Most Impala SQL features work equivalently through the 
<cmdname>impala-shell</cmdname> interpreter
+        of the JDBC or ODBC APIs. The following are some exceptions to keep in 
mind when switching between
+        the interactive shell and applications using the APIs:
+      </p>
+      <ul>
+        <li>
+          <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+          <ul>
+          <li>
+          <p>
+            Queries involving the complex types (<codeph>ARRAY</codeph>, 
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>)
+            require notation that might not be available in all levels of JDBC 
and ODBC drivers.
+            If you have trouble querying such a table due to the driver level 
or
+            inability to edit the queries used by the application, you can 
create a view that exposes
+            a <q>flattened</q> version of the complex columns and point the 
application at the view.
+            See <xref href="impala_complex_types.xml#complex_types"/> for 
details.
+          </p>
+        </li>
+        <li>
+          <p>
+            The complex types available in <keyword keyref="impala23_full"/> 
and higher are supported by the
+            JDBC <codeph>getColumns()</codeph> API.
+            Both <codeph>MAP</codeph> and <codeph>ARRAY</codeph> are reported 
as the JDBC SQL Type <codeph>ARRAY</codeph>,
+            because this is the closest matching Java SQL type. This behavior 
is consistent with Hive.
+            <codeph>STRUCT</codeph> types are reported as the JDBC SQL Type 
<codeph>STRUCT</codeph>.
+          </p>
+          <p>
+            To be consistent with Hive's behavior, the TYPE_NAME field is 
populated
+            with the primitive type name for scalar types, and with the full 
<codeph>toSql()</codeph>
+            for complex types. The resulting type names are somewhat 
inconsistent,
+            because nested types are printed differently than top-level types. 
For example,
+            the following list shows how <codeph>toSQL()</codeph> for Impala 
types are
+            translated to <codeph>TYPE_NAME</codeph> values:
+<codeblock><![CDATA[DECIMAL(10,10)         becomes  DECIMAL
+CHAR(10)               becomes  CHAR
+VARCHAR(10)            becomes  VARCHAR
+ARRAY<DECIMAL(10,10)>  becomes  ARRAY<DECIMAL(10,10)>
+ARRAY<CHAR(10)>        becomes  ARRAY<CHAR(10)>
+ARRAY<VARCHAR(10)>     becomes  ARRAY<VARCHAR(10)>
+]]>
+</codeblock>
+          </p>
+          </li>
+        </ul>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+</concept>

[24/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to