[11/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

jrussell Tue, 26 Jul 2016 16:05:48 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_functions.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_functions.xml b/docs/topics/impala_functions.xml
new file mode 100644
index 0000000..527744b
--- /dev/null
+++ b/docs/topics/impala_functions.xml
@@ -0,0 +1,162 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="builtins">
+
+  <title id="title_functions">Impala Built-In Functions</title>
+  <titlealts><navtitle>Built-In Functions</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <draft-comment translate="no">
+Opportunity to conref some material between here and the "Functions" topic 
under "Schema Objects".
+</draft-comment>
+
+    <p>
+      Impala supports several categories of built-in functions. These 
functions let you perform mathematical
+      calculations, string manipulation, date calculations, and other kinds of 
data transformations directly in
+      <codeph>SELECT</codeph> statements. The built-in functions let a SQL 
query return results with all
+      formatting, calculating, and type conversions applied, rather than 
performing time-consuming postprocessing
+      in another application. By applying function calls where practical, you 
can make a SQL query that is as
+      convenient as an expression in a procedural programming language or a 
formula in a spreadsheet.
+    </p>
+
+    <p>
+      The categories of functions supported by Impala are:
+    </p>
+
+    <ul>
+      <li>
+        <xref href="impala_math_functions.xml#math_functions"/>
+      </li>
+
+      <li>
+        <xref href="impala_conversion_functions.xml#conversion_functions"/>
+      </li>
+
+      <li>
+        <xref href="impala_datetime_functions.xml#datetime_functions"/>
+      </li>
+
+      <li>
+        <xref href="impala_conditional_functions.xml#conditional_functions"/>
+      </li>
+
+      <li>
+        <xref href="impala_string_functions.xml#string_functions"/>
+      </li>
+
+      <li>
+        Aggregation functions, explained in <xref 
href="impala_aggregate_functions.xml#aggregate_functions"/>.
+      </li>
+    </ul>
+
+    <p>
+      You call any of these functions through the <codeph>SELECT</codeph> 
statement. For most functions, you can
+      omit the <codeph>FROM</codeph> clause and supply literal values for any 
required arguments:
+    </p>
+
+<codeblock>select abs(-1);
++---------+
+| abs(-1) |
++---------+
+| 1       |
++---------+
+
+select concat('The rain ', 'in Spain');
++---------------------------------+
+| concat('the rain ', 'in spain') |
++---------------------------------+
+| The rain in Spain               |
++---------------------------------+
+
+select power(2,5);
++-------------+
+| power(2, 5) |
++-------------+
+| 32          |
++-------------+
+</codeblock>
+
+    <p>
+      When you use a <codeph>FROM</codeph> clause and specify a column name as 
a function argument, the function is
+      applied for each item in the result set:
+    </p>
+
+<!-- TK: make real output for these; change the queries if necessary to use 
tables I already have. -->
+
+<codeblock>select concat('Country = ',country_code) from all_countries where 
population &gt; 100000000;
+select round(price) as dollar_value from product_catalog where price between 
0.0 and 100.0;
+</codeblock>
+
+    <p>
+      Typically, if any argument to a built-in function is 
<codeph>NULL</codeph>, the result value is also
+      <codeph>NULL</codeph>:
+    </p>
+
+<codeblock>select cos(null);
++-----------+
+| cos(null) |
++-----------+
+| NULL      |
++-----------+
+
+select power(2,null);
++----------------+
+| power(2, null) |
++----------------+
+| NULL           |
++----------------+
+
+select concat('a',null,'b');
++------------------------+
+| concat('a', null, 'b') |
++------------------------+
+| NULL                   |
++------------------------+
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/aggr1"/>
+
+<codeblock conref="../shared/impala_common.xml#common/aggr2"/>
+
+    <p conref="../shared/impala_common.xml#common/aggr3"/>
+
+    <p>
+      Aggregate functions are a special category with different rules. These 
functions calculate a return value
+      across all the items in a result set, so they do require a 
<codeph>FROM</codeph> clause in the query:
+    </p>
+
+<!-- TK: make real output for these; change the queries if necessary to use 
tables I already have. -->
+
+<codeblock>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</codeblock>
+
+    <p>
+      Aggregate functions also ignore <codeph>NULL</codeph> values rather than 
returning a <codeph>NULL</codeph>
+      result. For example, if some rows have <codeph>NULL</codeph> for a 
particular column, those rows are ignored
+      when computing the AVG() for that column. Likewise, specifying 
<codeph>COUNT(col_name)</codeph> in a query
+      counts only those rows where <codeph>col_name</codeph> contains a 
non-<codeph>NULL</codeph> value.
+    </p>
+
+    <p rev="2.0.0">
+      Analytic functions are a variation on aggregate functions. Instead of 
returning a single value, or an
+      identical value for each group of rows, they can compute values that 
vary based on a <q>window</q> consisting
+      of of other rows around them in the result set.
+    </p>
+
+    <p outputclass="toc"/>
+
+  </conbody>
+
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_functions_overview.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_functions_overview.xml 
b/docs/topics/impala_functions_overview.xml
new file mode 100644
index 0000000..26a4d35
--- /dev/null
+++ b/docs/topics/impala_functions_overview.xml
@@ -0,0 +1,116 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="functions">
+
+  <title>Overview of Impala Functions</title>
+  <titlealts><navtitle>Functions</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Functions let you apply arithmetic, string, or other computations and 
transformations to Impala data. You
+      typically use them in <codeph>SELECT</codeph> lists and 
<codeph>WHERE</codeph> clauses to filter and format
+      query results so that the result set is exactly what you want, with no 
further processing needed on the
+      application side.
+    </p>
+
+    <p>
+      Scalar functions return a single result for each input row. See <xref 
href="impala_functions.xml#builtins"/>.
+    </p>
+
+<codeblock>[localhost:21000] > select name, population from country where 
continent = 'North America' order by population desc limit 4;
+[localhost:21000] > select upper(name), population from country where 
continent = 'North America' order by population desc limit 4;
++-------------+------------+
+| upper(name) | population |
++-------------+------------+
+| USA         | 320000000  |
+| MEXICO      | 122000000  |
+| CANADA      | 25000000   |
+| GUATEMALA   | 16000000   |
++-------------+------------+
+</codeblock>
+    <p>
+      Aggregate functions combine the results from multiple rows:
+      either a single result for the entire table, or a separate result for 
each group of rows.
+      Aggregate functions are frequently used in combination with 
<codeph>GROUP BY</codeph>
+      and <codeph>HAVING</codeph> clauses in the <codeph>SELECT</codeph> 
statement.
+      See <xref href="impala_aggregate_functions.xml#aggregate_functions"/>.
+    </p>
+
+<codeblock>[localhost:21000] > select continent, <b>sum(population)</b> as 
howmany from country <b>group by continent</b> order by howmany desc;
++---------------+------------+
+| continent     | howmany    |
++---------------+------------+
+| Asia          | 4298723000 |
+| Africa        | 1110635000 |
+| Europe        | 742452000  |
+| North America | 565265000  |
+| South America | 406740000  |
+| Oceania       | 38304000   |
++---------------+------------+
+</codeblock>
+
+    <p>
+      User-defined functions (UDFs) let you code your own logic.  They can be 
either scalar or aggregate functions.
+      UDFs let you implement important business or scientific logic using 
high-performance code for Impala to automatically parallelize.
+      You can also use UDFs to implement convenience functions to simplify 
reporting or porting SQL from other database systems.
+      See <xref href="impala_udf.xml#udfs"/>.
+    </p>
+
+<codeblock>[localhost:21000] > select <b>rot13('Hello world!')</b> as 'Weak 
obfuscation';
++------------------+
+| weak obfuscation |
++------------------+
+| Uryyb jbeyq!     |
++------------------+
+[localhost:21000] > select <b>likelihood_of_new_subatomic_particle(sensor1, 
sensor2, sensor3)</b> as probability
+                  > from experimental_results group by experiment;
+</codeblock>
+
+    <p>
+      Each function is associated with a specific database. For example, if 
you issue a <codeph>USE somedb</codeph>
+      statement followed by <codeph>CREATE FUNCTION somefunc</codeph>, the new 
function is created in the
+      <codeph>somedb</codeph> database, and you could refer to it through the 
fully qualified name
+      <codeph>somedb.somefunc</codeph>. You could then issue another 
<codeph>USE</codeph> statement
+      and create a function with the same name in a different database.
+    </p>
+
+    <p>
+      Impala built-in functions are associated with a special database named 
<codeph>_impala_builtins</codeph>,
+      which lets you refer to them from any database without qualifying the 
name.
+    </p>
+
+<codeblock>[localhost:21000] > show databases;
++-------------------------+
+| name                    |
++-------------------------+
+| <b>_impala_builtins</b>        |
+| analytic_functions      |
+| avro_testing            |
+| data_file_size          |
+...
+[localhost:21000] > show functions in _impala_builtins like '*subs*';
++-------------+-----------------------------------+
+| return type | signature                         |
++-------------+-----------------------------------+
+| STRING      | substr(STRING, BIGINT)            |
+| STRING      | substr(STRING, BIGINT, BIGINT)    |
+| STRING      | substring(STRING, BIGINT)         |
+| STRING      | substring(STRING, BIGINT, BIGINT) |
++-------------+-----------------------------------+
+</codeblock>
+
+    <p>
+      <b>Related statements:</b> <xref 
href="impala_create_function.xml#create_function"/>,
+      <xref href="impala_drop_function.xml#drop_function"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_grant.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_grant.xml b/docs/topics/impala_grant.xml
new file mode 100644
index 0000000..6aad41e
--- /dev/null
+++ b/docs/topics/impala_grant.xml
@@ -0,0 +1,117 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="grant">
+
+  <title>GRANT Statement (CDH 5.2 or higher only)</title>
+  <titlealts><navtitle>GRANT (CDH 5.2 or higher only)</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Roles"/>
+      <!-- Consider whether to go deeper into categories like Security for the 
Sentry-related statements. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">GRANT statement</indexterm>
+<!-- Copied from Sentry docs. Turn into conref. I did some rewording for 
clarity. -->
+      The <codeph>GRANT</codeph> statement grants roles or privileges on 
specified objects to groups. Only Sentry
+      administrative users can grant roles to a group.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock rev="2.3.0 collevelauth">GRANT ROLE <varname>role_name</varname> TO 
GROUP <varname>group_name</varname>
+
+GRANT <varname>privilege</varname> ON <varname>object_type</varname> 
<varname>object_name</varname>
+   TO [ROLE] <varname>roleName</varname>
+   [WITH GRANT OPTION]
+
+<ph rev="2.3.0">privilege ::= SELECT | SELECT(<varname>column_name</varname>) 
| INSERT | ALL</ph>
+object_type ::= TABLE | DATABASE | SERVER | URI
+</codeblock>
+
+    <p>
+      Typically, the object name is an identifier. For URIs, it is a string 
literal.
+    </p>
+
+<!-- Turn privilege info into a conref or series of conrefs. (In both GRANT 
and REVOKE.) -->
+
+    <p conref="../shared/impala_common.xml#common/privileges_blurb"/>
+
+    <p>
+<!-- To do: The wording here can be fluid, and it's reused in several 
statements. Turn into a conref. -->
+      Only administrative users (initially, a predefined set of users 
specified in the Sentry service configuration
+      file) can use this statement.
+    </p>
+
+    <p>
+      The <codeph>WITH GRANT OPTION</codeph> clause allows members of the 
specified role to issue
+      <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements for those 
same privileges
+<!-- Copied from Sentry docs. Turn into conref. I did some rewording for 
clarity. -->
+      Hence, if a role has the <codeph>ALL</codeph> privilege on a database 
and the <codeph>WITH GRANT
+      OPTION</codeph> set, users granted that role can execute 
<codeph>GRANT</codeph>/<codeph>REVOKE</codeph>
+      statements only for that database or child tables of the database. This 
means a user could revoke the
+      privileges of the user that provided them the <codeph>GRANT 
OPTION</codeph>.
+    </p>
+
+    <p>
+<!-- Copied from Sentry docs. Turn into conref. Except I changed Hive to 
Impala. -->
+      Impala does not currently support revoking only the <codeph>WITH GRANT 
OPTION</codeph> from a privilege
+      previously granted to a role. To remove the <codeph>WITH GRANT 
OPTION</codeph>, revoke the privilege and
+      grant it again without the <codeph>WITH GRANT OPTION</codeph> flag.
+    </p>
+
+    <p rev="2.3.0 collevelauth">
+      The ability to grant or revoke <codeph>SELECT</codeph> privilege on 
specific columns is available
+      in CDH 5.5 / Impala 2.3 and higher. See <xref 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/>
+      for details.
+    </p>
+
+<!-- Turn compatibility info into a conref or series of conrefs. (In both 
GRANT and REVOKE.) -->
+
+<!-- If they diverge during development, consider the version here in GRANT 
the authoritative one. -->
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      <ul>
+        <li>
+          The Impala <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements are available in CDH 5.2 and
+          later.
+        </li>
+
+        <li>
+          In CDH 5.1 and later, Impala can make use of any roles and 
privileges specified by the
+          <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in 
Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li>
+          The Impala <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements for privileges do not require
+          the <codeph>ROLE</codeph> keyword to be repeated before each role 
name, unlike the equivalent Hive
+          statements.
+        </li>
+
+        <li conref="../shared/impala_common.xml#common/grant_revoke_single"/>
+      </ul>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_authorization.xml#authorization"/>, <xref 
href="impala_revoke.xml#revoke"/>,
+      <xref href="impala_create_role.xml#create_role"/>, <xref 
href="impala_drop_role.xml#drop_role"/>,
+      <xref href="impala_show.xml#show"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_group_by.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_group_by.xml b/docs/topics/impala_group_by.xml
new file mode 100644
index 0000000..10b7de4
--- /dev/null
+++ b/docs/topics/impala_group_by.xml
@@ -0,0 +1,137 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="group_by">
+
+  <title>GROUP BY Clause</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Aggregate Functions"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Specify the <codeph>GROUP BY</codeph> clause in queries that use 
aggregation functions, such as
+      <codeph><xref href="impala_count.xml#count">COUNT()</xref></codeph>,
+      <codeph><xref href="impala_sum.xml#sum">SUM()</xref></codeph>,
+      <codeph><xref href="impala_avg.xml#avg">AVG()</xref></codeph>,
+      <codeph><xref href="impala_min.xml#min">MIN()</xref></codeph>, and
+      <codeph><xref href="impala_max.xml#max">MAX()</xref></codeph>. Specify 
in the
+      <codeph><xref href="impala_group_by.xml#group_by">GROUP 
BY</xref></codeph> clause the names of all the
+      columns that do not participate in the aggregation operation.
+    </p>
+
+    <!-- Good to show an example of cases where ORDER BY does and doesn't work 
with complex types. -->
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      In CDH 5.5 / Impala 2.3 and higher, the complex data types 
<codeph>STRUCT</codeph>,
+      <codeph>ARRAY</codeph>, and <codeph>MAP</codeph> are available. These 
columns cannot
+      be referenced directly in the <codeph>ORDER BY</codeph> clause.
+      When you query a complex type column, you use join notation to 
<q>unpack</q> the elements
+      of the complex type, and within the join query you can include an 
<codeph>ORDER BY</codeph>
+      clause to control the order in the result set of the scalar elements 
from the complex type.
+      See <xref href="impala_complex_types.xml#complex_types"/> for details 
about Impala support for complex types.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/zero_length_strings"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      For example, the following query finds the 5 items that sold the highest 
total quantity (using the
+      <codeph>SUM()</codeph> function, and also counts the number of sales 
transactions for those items (using the
+      <codeph>COUNT()</codeph> function). Because the column representing the 
item IDs is not used in any
+      aggregation functions, we specify that column in the <codeph>GROUP 
BY</codeph> clause.
+    </p>
+
+<codeblock>select
+  <b>ss_item_sk</b> as Item,
+  <b>count</b>(ss_item_sk) as Times_Purchased,
+  <b>sum</b>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+  <b>group by ss_item_sk</b>
+  order by sum(ss_quantity) desc
+  limit 5;
++-------+-----------------+--------------------------+
+| item  | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 9325  | 372             | 19072                    |
+| 4279  | 357             | 18501                    |
+| 7507  | 371             | 18475                    |
+| 5953  | 369             | 18451                    |
+| 16753 | 375             | 18446                    |
++-------+-----------------+--------------------------+</codeblock>
+
+    <p>
+      The <codeph>HAVING</codeph> clause lets you filter the results of 
aggregate functions, because you cannot
+      refer to those expressions in the <codeph>WHERE</codeph> clause. For 
example, to find the 5 lowest-selling
+      items that were included in at least 100 sales transactions, we could 
use this query:
+    </p>
+
+<codeblock>select
+  <b>ss_item_sk</b> as Item,
+  <b>count</b>(ss_item_sk) as Times_Purchased,
+  <b>sum</b>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+  <b>group by ss_item_sk</b>
+  <b>having times_purchased &gt;= 100</b>
+  order by sum(ss_quantity)
+  limit 5;
++-------+-----------------+--------------------------+
+| item  | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 13943 | 105             | 4087                     |
+| 2992  | 101             | 4176                     |
+| 4773  | 107             | 4204                     |
+| 14350 | 103             | 4260                     |
+| 11956 | 102             | 4275                     |
++-------+-----------------+--------------------------+</codeblock>
+
+    <p>
+      When performing calculations involving scientific or financial data, 
remember that columns with type
+      <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph> are stored as true 
floating-point numbers, which cannot
+      precisely represent every possible fractional value. Thus, if you 
include a <codeph>FLOAT</codeph> or
+      <codeph>DOUBLE</codeph> column in a <codeph>GROUP BY</codeph> clause, 
the results might not precisely match
+      literal values in your query or from an original Text data file. Use 
rounding operations, the
+      <codeph>BETWEEN</codeph> operator, or another arithmetic technique to 
match floating-point values that are
+      <q>near</q> literal values you expect. For example, this query on the 
<codeph>ss_wholesale_cost</codeph>
+      column returns cost values that are close but not identical to the 
original figures that were entered as
+      decimal fractions.
+    </p>
+
+<codeblock>select ss_wholesale_cost, avg(ss_quantity * ss_sales_price) as 
avg_revenue_per_sale
+  from sales
+  group by ss_wholesale_cost
+  order by avg_revenue_per_sale desc
+  limit 5;
++-------------------+----------------------+
+| ss_wholesale_cost | avg_revenue_per_sale |
++-------------------+----------------------+
+| 96.94000244140625 | 4454.351539300434    |
+| 95.93000030517578 | 4423.119941283189    |
+| 98.37999725341797 | 4332.516490316291    |
+| 97.97000122070312 | 4330.480601655014    |
+| 98.52999877929688 | 4291.316953108634    |
++-------------------+----------------------+</codeblock>
+
+    <p>
+      Notice how wholesale cost values originally entered as decimal fractions 
such as <codeph>96.94</codeph> and
+      <codeph>98.38</codeph> are slightly larger or smaller in the result set, 
due to precision limitations in the
+      hardware floating-point types. The imprecise representation of 
<codeph>FLOAT</codeph> and
+      <codeph>DOUBLE</codeph> values is why financial data processing systems 
often store currency using data types
+      that are less space-efficient but avoid these types of rounding errors.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_select.xml#select"/>,
+      <xref href="impala_aggregate_functions.xml#aggregate_functions"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_group_concat.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_group_concat.xml 
b/docs/topics/impala_group_concat.xml
new file mode 100644
index 0000000..b2a7ff6
--- /dev/null
+++ b/docs/topics/impala_group_concat.xml
@@ -0,0 +1,133 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="group_concat">
+
+  <title>GROUP_CONCAT Function</title>
+  <titlealts><navtitle>GROUP_CONCAT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="Aggregate Functions"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">group_concat() function</indexterm>
+      An aggregate function that returns a single string representing the 
argument value concatenated together for
+      each row of the result set. If the optional separator string is 
specified, the separator is added between
+      each pair of concatenated values. The default separator is a comma 
followed by a space.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<!-- Might allow DISTINCT at some point. Check: does it allow ALL now? -->
+
+<codeblock>GROUP_CONCAT([ALL] <varname>expression</varname> [, 
<varname>separator</varname>])</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/concat_blurb"/>
+
+    <p>
+      By default, returns a single string covering the whole result set. To 
include other columns or values in the
+      result set, or to produce multiple concatenated strings for subsets of 
rows, include a <codeph>GROUP
+      BY</codeph> clause in the query.
+    </p>
+
+    <p>
+      <b>Return type:</b> <codeph>STRING</codeph>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p>
+      You cannot apply the <codeph>DISTINCT</codeph> operator to the argument 
of this function.
+    </p>
+
+    <p 
conref="../shared/impala_common.xml#common/analytic_not_allowed_caveat"/>
+
+    <p>
+      Currently, Impala returns an error if the result value grows larger than 
1 GiB.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following examples illustrate various aspects of the 
<codeph>GROUP_CONCAT()</codeph> function.
+    </p>
+
+    <p>
+      You can call the function directly on a <codeph>STRING</codeph> column. 
To use it with a numeric column, cast
+      the value to <codeph>STRING</codeph>.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table t1 (x int, s string);
+[localhost:21000] &gt; insert into t1 values (1, "one"), (3, "three"), (2, 
"two"), (1, "one");
+[localhost:21000] &gt; select group_concat(s) from t1;
++----------------------+
+| group_concat(s)      |
++----------------------+
+| one, three, two, one |
++----------------------+
+[localhost:21000] &gt; select group_concat(cast(x as string)) from t1;
++---------------------------------+
+| group_concat(cast(x as string)) |
++---------------------------------+
+| 1, 3, 2, 1                      |
++---------------------------------+
+</codeblock>
+
+    <p>
+      The optional separator lets you format the result in flexible ways. The 
separator can be an arbitrary string
+      expression, not just a single character.
+    </p>
+
+<codeblock>[localhost:21000] &gt; select group_concat(s,"|") from t1;
++----------------------+
+| group_concat(s, '|') |
++----------------------+
+| one|three|two|one    |
++----------------------+
+[localhost:21000] &gt; select group_concat(s,'---') from t1;
++-------------------------+
+| group_concat(s, '---')  |
++-------------------------+
+| one---three---two---one |
++-------------------------+
+</codeblock>
+
+    <p>
+      The default separator is a comma followed by a space. To get a 
comma-delimited result without extra spaces,
+      specify a delimiter character that is only a comma.
+    </p>
+
+<codeblock>[localhost:21000] &gt; select group_concat(s,',') from t1;
++----------------------+
+| group_concat(s, ',') |
++----------------------+
+| one,three,two,one    |
++----------------------+
+</codeblock>
+
+    <p>
+      Including a <codeph>GROUP BY</codeph> clause lets you produce a 
different concatenated result for each group
+      in the result set. In this example, the only <codeph>X</codeph> value 
that occurs more than once is
+      <codeph>1</codeph>, so that is the only row in the result set where 
<codeph>GROUP_CONCAT()</codeph> returns a
+      delimited value. For groups containing a single value, 
<codeph>GROUP_CONCAT()</codeph> returns the original
+      value of its <codeph>STRING</codeph> argument.
+    </p>
+
+<codeblock>[localhost:21000] &gt; select x, group_concat(s) from t1 group by x;
++---+-----------------+
+| x | group_concat(s) |
++---+-----------------+
+| 2 | two             |
+| 3 | three           |
+| 1 | one, one        |
++---+-----------------+
+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_having.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_having.xml b/docs/topics/impala_having.xml
new file mode 100644
index 0000000..064a4a8
--- /dev/null
+++ b/docs/topics/impala_having.xml
@@ -0,0 +1,42 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="having">
+
+  <title>HAVING Clause</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Aggregate Functions"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Performs a filter operation on a <codeph>SELECT</codeph> query, by 
examining the results of aggregation
+      functions rather than testing each individual table row. Therefore, it 
is always used in conjunction with a
+      function such as <codeph><xref 
href="impala_count.xml#count">COUNT()</xref></codeph>,
+      <codeph><xref href="impala_sum.xml#sum">SUM()</xref></codeph>,
+      <codeph><xref href="impala_avg.xml#avg">AVG()</xref></codeph>,
+      <codeph><xref href="impala_min.xml#min">MIN()</xref></codeph>, or
+      <codeph><xref href="impala_max.xml#max">MAX()</xref></codeph>, and 
typically with the
+      <codeph><xref href="impala_group_by.xml#group_by">GROUP 
BY</xref></codeph> clause also.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p rev="2.0.0">
+      The filter expression in the <codeph>HAVING</codeph> clause cannot 
include a scalar subquery.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_select.xml#select"/>,
+      <xref href="impala_group_by.xml#group_by"/>,
+      <xref href="impala_aggregate_functions.xml#aggregate_functions"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_hbase_cache_blocks.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_hbase_cache_blocks.xml 
b/docs/topics/impala_hbase_cache_blocks.xml
new file mode 100644
index 0000000..d42cbf6
--- /dev/null
+++ b/docs/topics/impala_hbase_cache_blocks.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="hbase_cache_blocks">
+
+  <title>HBASE_CACHE_BLOCKS Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="HBase"/>
+      <data name="Category" value="Impala Query Options"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">HBASE_CACHE_BLOCKS query 
option</indexterm>
+      Setting this option is equivalent to calling the 
<codeph>setCacheBlocks</codeph> method of the class
+      <xref 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html"; 
scope="external" format="html">org.apache.hadoop.hbase.client.Scan</xref>,
+      in an HBase Java application. Helps to control the memory pressure on 
the HBase region server, in conjunction
+      with the <codeph>HBASE_CACHING</codeph> query option.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hbase.xml#impala_hbase"/>,
+      <xref href="impala_hbase_caching.xml#hbase_caching"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_hbase_caching.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_hbase_caching.xml 
b/docs/topics/impala_hbase_caching.xml
new file mode 100644
index 0000000..e543792
--- /dev/null
+++ b/docs/topics/impala_hbase_caching.xml
@@ -0,0 +1,39 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="hbase_caching">
+
+  <title>HBASE_CACHING Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="HBase"/>
+      <data name="Category" value="Impala Query Options"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">HBASE_CACHING query option</indexterm>
+      Setting this option is equivalent to calling the 
<codeph>setCaching</codeph> method of the class
+      <xref 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html"; 
scope="external" format="html">org.apache.hadoop.hbase.client.Scan</xref>,
+      in an HBase Java application. Helps to control the memory pressure on 
the HBase region server, in conjunction
+      with the <codeph>HBASE_CACHE_BLOCKS</codeph> query option.
+    </p>
+
+    <p>
+      <b>Type:</b> <codeph>BOOLEAN</codeph>
+    </p>
+
+    <p>
+      <b>Default:</b> 0
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hbase.xml#impala_hbase"/>,
+      <xref href="impala_hbase_cache_blocks.xml#hbase_cache_blocks"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_hints.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_hints.xml b/docs/topics/impala_hints.xml
new file mode 100644
index 0000000..429fb19
--- /dev/null
+++ b/docs/topics/impala_hints.xml
@@ -0,0 +1,247 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="hints">
+
+  <title>Query Hints in Impala SELECT Statements</title>
+  <titlealts><navtitle>Hints</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Troubleshooting"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">hints</indexterm>
+      The Impala SQL dialect supports query hints, for fine-tuning the inner 
workings of queries. Specify hints as
+      a temporary workaround for expensive queries, where missing statistics 
or other factors cause inefficient
+      performance.
+    </p>
+
+    <p>
+      Hints are most often used for the most resource-intensive kinds of 
Impala queries:
+    </p>
+
+    <ul>
+      <li>
+        Join queries involving large tables, where intermediate result sets 
are transmitted across the network to
+        evaluate the join conditions.
+      </li>
+
+      <li>
+        Inserting into partitioned Parquet tables, where many memory buffers 
could be allocated on each host to
+        hold intermediate results for each partition.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      You can represent the hints as keywords surrounded by 
<codeph>[]</codeph> square brackets; include the
+      brackets in the text of the SQL statement.
+    </p>
+
+<codeblock>SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
+<varname>join_left_hand_table</varname>
+  JOIN [{BROADCAST|SHUFFLE}]
+<varname>join_right_hand_table</varname>
+<varname>remainder_of_query</varname>;
+
+INSERT <varname>insert_clauses</varname>
+  [{SHUFFLE|NOSHUFFLE}]
+  SELECT <varname>remainder_of_query</varname>;
+</codeblock>
+
+    <p rev="2.0.0">
+      In Impala 2.0 and higher, or CDH 5.2 and higher, you can also specify 
the hints inside comments that use
+      either the <codeph>/* */</codeph> or <codeph>--</codeph> notation. 
Specify a <codeph>+</codeph> symbol
+      immediately before the hint name.
+    </p>
+
+<codeblock rev="2.0.0">SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
+<varname>join_left_hand_table</varname>
+  JOIN /* +BROADCAST|SHUFFLE */
+<varname>join_right_hand_table</varname>
+<varname>remainder_of_query</varname>;
+
+SELECT <varname>select_list</varname> FROM
+<varname>join_left_hand_table</varname>
+  JOIN -- +BROADCAST|SHUFFLE
+<varname>join_right_hand_table</varname>
+<varname>remainder_of_query</varname>;
+
+INSERT <varname>insert_clauses</varname>
+  /* +SHUFFLE|NOSHUFFLE */
+  SELECT <varname>remainder_of_query</varname>;
+
+INSERT <varname>insert_clauses</varname>
+  -- +SHUFFLE|NOSHUFFLE
+  SELECT <varname>remainder_of_query</varname>;
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      With both forms of hint syntax, include the 
<codeph>STRAIGHT_JOIN</codeph>
+      keyword immediately after the <codeph>SELECT</codeph> keyword to prevent 
Impala from
+      reordering the tables in a way that makes the hint ineffective.
+    </p>
+
+    <p>
+      To reduce the need to use hints, run the <codeph>COMPUTE STATS</codeph> 
statement against all tables involved
+      in joins, or used as the source tables for <codeph>INSERT ... 
SELECT</codeph> operations where the
+      destination is a partitioned Parquet table. Do this operation after 
loading data or making substantial
+      changes to the data within each table. Having up-to-date statistics 
helps Impala choose more efficient query
+      plans without the need for hinting. See <xref 
href="impala_perf_stats.xml#perf_stats"/> for details and
+      examples.
+    </p>
+
+    <p>
+      To see which join strategy is used for a particular query, examine the 
<codeph>EXPLAIN</codeph> output for
+      that query. See <xref href="impala_explain_plan.xml#perf_explain"/> for 
details and examples.
+    </p>
+
+    <p>
+      <b>Hints for join queries:</b>
+    </p>
+
+    <p>
+      The <codeph>[BROADCAST]</codeph> and <codeph>[SHUFFLE]</codeph> hints 
control the execution strategy for join
+      queries. Specify one of the following constructs immediately after the 
<codeph>JOIN</codeph> keyword in a
+      query:
+    </p>
+
+    <ul>
+      <li>
+        <codeph>[SHUFFLE]</codeph> - Makes that join operation use the 
<q>partitioned</q> technique, which divides
+        up corresponding rows from both tables using a hashing algorithm, 
sending subsets of the rows to other
+        nodes for processing. (The keyword <codeph>SHUFFLE</codeph> is used to 
indicate a <q>partitioned join</q>,
+        because that type of join is not related to <q>partitioned 
tables</q>.) Since the alternative
+        <q>broadcast</q> join mechanism is the default when table and index 
statistics are unavailable, you might
+        use this hint for queries where broadcast joins are unsuitable; 
typically, partitioned joins are more
+        efficient for joins between large tables of similar size.
+      </li>
+
+      <li>
+        <codeph>[BROADCAST]</codeph> - Makes that join operation use the 
<q>broadcast</q> technique that sends the
+        entire contents of the right-hand table to all nodes involved in 
processing the join. This is the default
+        mode of operation when table and index statistics are unavailable, so 
you would typically only need it if
+        stale metadata caused Impala to mistakenly choose a partitioned join 
operation. Typically, broadcast joins
+        are more efficient in cases where one table is much smaller than the 
other. (Put the smaller table on the
+        right side of the <codeph>JOIN</codeph> operator.)
+      </li>
+    </ul>
+
+    <p>
+      <b>Hints for INSERT ... SELECT queries:</b>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/insert_hints"/>
+
+    <p>
+      <b>Suggestions versus directives:</b>
+    </p>
+
+    <p>
+      In early Impala releases, hints were always obeyed and so acted more 
like directives. Once Impala gained join
+      order optimizations, sometimes join queries were automatically reordered 
in a way that made a hint
+      irrelevant. Therefore, the hints act more like suggestions in Impala 
1.2.2 and higher.
+    </p>
+
+    <p>
+      To force Impala to follow the hinted execution mechanism for a join 
query, include the
+      <codeph>STRAIGHT_JOIN</codeph> keyword in the <codeph>SELECT</codeph> 
statement. See
+      <xref href="impala_perf_joins.xml#straight_join"/> for details. When you 
use this technique, Impala does not
+      reorder the joined tables at all, so you must be careful to arrange the 
join order to put the largest table
+      (or subquery result set) first, then the smallest, second smallest, 
third smallest, and so on. This ordering lets Impala do the
+      most I/O-intensive parts of the query using local reads on the data 
nodes, and then reduce the size of the
+      intermediate result set as much as possible as each subsequent table or 
subquery result set is joined.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p>
+      Queries that include subqueries in the <codeph>WHERE</codeph> clause can 
be rewritten internally as join
+      queries. Currently, you cannot apply hints to the joins produced by 
these types of queries.
+    </p>
+
+    <p>
+      Because hints can prevent queries from taking advantage of new metadata 
or improvements in query planning,
+      use them only when required to work around performance issues, and be 
prepared to remove them when they are
+      no longer required, such as after a new Impala release or bug fix.
+    </p>
+
+    <p>
+      In particular, the <codeph>[BROADCAST]</codeph> and 
<codeph>[SHUFFLE]</codeph> hints are expected to be
+      needed much less frequently in Impala 1.2.2 and higher, because the join 
order optimization feature in
+      combination with the <codeph>COMPUTE STATS</codeph> statement now 
automatically choose join order and join
+      mechanism without the need to rewrite the query and add hints. See
+      <xref href="impala_perf_joins.xml#perf_joins"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p rev="2.0.0">
+      The hints embedded within <codeph>--</codeph> comments are compatible 
with Hive queries. The hints embedded
+      within <codeph>/* */</codeph> comments or <codeph>[ ]</codeph> square 
brackets are not recognized by or not
+      compatible with Hive. For example, Hive raises an error for Impala hints 
within <codeph>/* */</codeph>
+      comments because it does not recognize the Impala hint names.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/view_blurb"/>
+
+    <p rev="2.0.0">
+      If you use a hint in the query that defines a view, the hint is 
preserved when you query the view. Impala
+      internally rewrites all hints in views to use the <codeph>--</codeph> 
comment notation, so that Hive can
+      query such views without errors due to unrecognized hint names.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      For example, this query joins a large customer table with a small lookup 
table of less than 100 rows. The
+      right-hand table can be broadcast efficiently to all nodes involved in 
the join. Thus, you would use the
+      <codeph>[broadcast]</codeph> hint to force a broadcast join strategy:
+    </p>
+
+<codeblock>select straight_join customer.address, state_lookup.state_name
+  from customer join <b>[broadcast]</b> state_lookup
+  on customer.state_id = state_lookup.state_id;</codeblock>
+
+    <p>
+      This query joins two large tables of unpredictable size. You might 
benchmark the query with both kinds of
+      hints and find that it is more efficient to transmit portions of each 
table to other nodes for processing.
+      Thus, you would use the <codeph>[shuffle]</codeph> hint to force a 
partitioned join strategy:
+    </p>
+
+<codeblock>select straight_join weather.wind_velocity, geospatial.altitude
+  from weather join <b>[shuffle]</b> geospatial
+  on weather.lat = geospatial.lat and weather.long = 
geospatial.long;</codeblock>
+
+    <p>
+      For joins involving three or more tables, the hint applies to the tables 
on either side of that specific
+      <codeph>JOIN</codeph> keyword. The <codeph>STRAIGHT_JOIN</codeph> 
keyword ensures that joins are processed
+      in a predictable order from left to right. For example, this query joins
+      <codeph>t1</codeph> and <codeph>t2</codeph> using a partitioned join, 
then joins that result set to
+      <codeph>t3</codeph> using a broadcast join:
+    </p>
+
+<codeblock>select straight_join t1.name, t2.id, t3.price
+  from t1 join <b>[shuffle]</b> t2 join <b>[broadcast]</b> t3
+  on t1.id = t2.id and t2.id = t3.id;</codeblock>
+
+    <draft-comment translate="no"> This is a good place to add more sample 
output showing before and after EXPLAIN plans. </draft-comment>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      For more background information about join queries, see <xref 
href="impala_joins.xml#joins"/>. For
+      performance considerations, see <xref 
href="impala_perf_joins.xml#perf_joins"/>.
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_identifiers.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_identifiers.xml 
b/docs/topics/impala_identifiers.xml
new file mode 100644
index 0000000..55477ed
--- /dev/null
+++ b/docs/topics/impala_identifiers.xml
@@ -0,0 +1,114 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="identifiers">
+
+  <title>Overview of Impala Identifiers</title>
+  <titlealts><navtitle>Identifiers</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="Tables"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Identifiers are the names of databases, tables, or columns that you 
specify in a SQL statement. The rules for
+      identifiers govern what names you can give to things you create, the 
notation for referring to names
+      containing unusual characters, and other aspects such as case 
sensitivity.
+    </p>
+
+    <ul>
+      <li>
+        <p>
+        The minimum length of an identifier is 1 character.
+        </p>
+      </li>
+
+      <li>
+        <p>
+        The maximum length of an identifier is currently 128 characters, 
enforced by the metastore database.
+        </p>
+      </li>
+
+      <li>
+        <p>
+        An identifier must start with an alphabetic character. The remainder 
can contain any combination of
+        alphanumeric characters and underscores. Quoting the identifier with 
backticks has no effect on the allowed
+        characters in the name.
+        </p>
+      </li>
+
+      <li>
+        <p>
+        An identifier can contain only ASCII characters.
+        </p>
+      </li>
+
+      <li>
+        <p>
+        To use an identifier name that matches one of the Impala reserved 
keywords (listed in
+        <xref href="impala_reserved_words.xml#reserved_words"/>), surround the 
identifier with <codeph>``</codeph>
+        characters (backticks). Quote the reserved word even if it is part of 
a fully qualified name.
+        The following example shows how a reserved word can be used as a 
column name if it is quoted
+        with backticks in the <codeph>CREATE TABLE</codeph> statement, and how 
the column name
+        must also be quoted with backticks in a query:
+        </p>
+<codeblock>[localhost:21000] > create table reserved (`data` string);
+
+[localhost:21000] > select data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select data from reserved
+       ^
+Encountered: DATA
+Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, 
STRAIGHT_JOIN, TRUE, IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] > select reserved.data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select reserved.data from reserved
+                ^
+Encountered: DATA
+Expected: IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] > select reserved.`data` from reserved;
+
+[localhost:21000] >
+</codeblock>
+
+        <note type="important">
+          Because the list of reserved words grows over time as new SQL syntax 
is added,
+          consider adopting coding conventions (especially for any automated 
scripts
+          or in packaged applications) to always quote all identifiers with 
backticks.
+          Quoting all identifiers protects your SQL from compatibility issues 
if
+          new reserved words are added in later releases.
+        </note>
+
+      </li>
+
+      <li>
+        <p>
+        Impala identifiers are always case-insensitive. That is, tables named 
<codeph>t1</codeph> and
+        <codeph>T1</codeph> always refer to the same table, regardless of 
quote characters. Internally, Impala
+        always folds all specified table and column names to lowercase. This 
is why the column headers in query
+        output are always displayed in lowercase.
+        </p>
+      </li>
+    </ul>
+
+    <p>
+      See <xref href="impala_aliases.xml#aliases"/> for how to define shorter 
or easier-to-remember aliases if the
+      original names are long or cryptic identifiers.
+      <ph conref="../shared/impala_common.xml#common/aliases_vs_identifiers"/>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/views_vs_identifiers"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_insert.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_insert.xml b/docs/topics/impala_insert.xml
new file mode 100644
index 0000000..6d0f68b
--- /dev/null
+++ b/docs/topics/impala_insert.xml
@@ -0,0 +1,676 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="insert">
+
+  <title>INSERT Statement</title>
+  <titlealts><navtitle>INSERT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="DML"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Tables"/>
+      <data audience="impala_next" name="Category" value="Kudu"/>
+      <!-- This is such an important statement, think if there are more 
applicable categories. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">INSERT statement</indexterm>
+      Impala supports inserting into tables and partitions that you create 
with the Impala <codeph>CREATE
+      TABLE</codeph> statement, or pre-defined tables and partitions created 
through Hive.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>[<varname>with_clause</varname>]
+INSERT { INTO | OVERWRITE } [TABLE] <varname>table_name</varname>
+  [(<varname>column_list</varname>)]
+  [ PARTITION (<varname>partition_clause</varname>)]
+{
+    [<varname>hint_clause</varname>] <varname>select_statement</varname>
+  | VALUES (<varname>value</varname> [, <varname>value</varname> ...]) [, 
(<varname>value</varname> [, <varname>value</varname> ...]) ...]
+}
+
+partition_clause ::= <varname>col_name</varname> [= 
<varname>constant</varname>] [, <varname>col_name</varname> [= 
<varname>constant</varname>] ...]
+
+hint_clause ::= [SHUFFLE] | [NOSHUFFLE]    (Note: the square brackets are part 
of the syntax.)
+</codeblock>
+
+    <p>
+      <b>Appending or replacing (INTO and OVERWRITE clauses):</b>
+    </p>
+
+    <p>
+      The <codeph>INSERT INTO</codeph> syntax appends data to a table. The 
existing data files are left as-is, and
+      the inserted data is put into one or more new data files.
+    </p>
+
+    <p>
+      The <codeph>INSERT OVERWRITE</codeph> syntax replaces the data in a 
table.
+<!-- What happens with INSERT OVERWRITE if the target is a single partition or 
multiple partitions? -->
+<!-- If that gets too detailed, cover later under "Partitioning 
Considerations". -->
+      Currently, the overwritten data files are deleted immediately; they do 
not go through the HDFS trash
+      mechanism.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      The <codeph>INSERT</codeph> statement currently does not support writing 
data files
+      containing complex types (<codeph>ARRAY</codeph>, 
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>).
+      To prepare Parquet data for such tables, you generate the data files 
outside Impala and then
+      use <codeph>LOAD DATA</codeph> or <codeph>CREATE EXTERNAL TABLE</codeph> 
to associate those
+      data files with the table. Currently, such tables must use the Parquet 
file format.
+      See <xref href="impala_complex_types.xml#complex_types"/> for details 
about working with complex types.
+    </p>
+
+    <p rev="kudu" audience="impala_next">
+      <b>Ignoring duplicate partition keys for Kudu tables (IGNORE clause)</b>
+    </p>
+
+    <p rev="kudu" audience="impala_next">
+      Normally, an <codeph>INSERT</codeph> operation into a Kudu table fails if
+      it would result in duplicate partition key columns for any rows.
+      Specify <codeph>INSERT IGNORE 
<varname>rest_of_statement</varname></codeph> to
+      make the <codeph>INSERT</codeph> continue in this case. The rows that 
would
+      have duplicate partition key columns are not inserted.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Impala currently supports:
+    </p>
+
+    <ul>
+      <li>
+        Copy data from another table using <codeph>SELECT</codeph> query. In 
Impala 1.2.1 and higher, you can
+        combine <codeph>CREATE TABLE</codeph> and <codeph>INSERT</codeph> 
operations into a single step with the
+        <codeph>CREATE TABLE AS SELECT</codeph> syntax, which bypasses the 
actual <codeph>INSERT</codeph> keyword.
+      </li>
+
+      <li>
+        An optional <xref href="impala_with.xml#with"><codeph>WITH</codeph> 
clause</xref> before the
+        <codeph>INSERT</codeph> keyword, to define a subquery referenced in 
the <codeph>SELECT</codeph> portion.
+      </li>
+
+      <li>
+        Create one or more new rows using constant expressions through 
<codeph>VALUES</codeph> clause. (The
+        <codeph>VALUES</codeph> clause was added in Impala 1.0.1.)
+      </li>
+
+      <li rev="1.1">
+        <p>
+          By default, the first column of each newly inserted row goes into 
the first column of the table, the
+          second column into the second column, and so on.
+        </p>
+        <p>
+          You can also specify the columns to be inserted, an arbitrarily 
ordered subset of the columns in the
+          destination table, by specifying a column list immediately after the 
name of the destination table. This
+          feature lets you adjust the inserted columns to match the layout of 
a <codeph>SELECT</codeph> statement,
+          rather than the other way around. (This feature was added in Impala 
1.1.)
+        </p>
+        <p>
+          The number of columns mentioned in the column list (known as the 
<q>column permutation</q>) must match
+          the number of columns in the <codeph>SELECT</codeph> list or the 
<codeph>VALUES</codeph> tuples. The
+          order of columns in the column permutation can be different than in 
the underlying table, and the columns
+          of each input row are reordered to match. If the number of columns 
in the column permutation is less than
+          in the destination table, all unmentioned columns are set to 
<codeph>NULL</codeph>.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          For a partitioned table, the optional <codeph>PARTITION</codeph> 
clause identifies which partition or
+          partitions the new values go into. If a partition key column is 
given a constant value such as
+          <codeph>PARTITION (year=2012)</codeph> or <codeph>PARTITION 
(year=2012, month=2)</codeph>, all the
+          inserted rows use those same values for those partition key columns 
and you omit any corresponding
+          columns in the source table from the <codeph>SELECT</codeph> list. 
This form is known as <q>static
+          partitioning</q>.
+        </p>
+        <p>
+          If a partition key column is mentioned but not assigned a value, 
such as in <codeph>PARTITION (year,
+          region)</codeph> (both columns unassigned) or 
<codeph>PARTITION(year, region='CA')</codeph>
+          (<codeph>year</codeph> column unassigned), the unassigned columns 
are filled in with the final columns of
+          the <codeph>SELECT</codeph> list. In this case, the number of 
columns in the <codeph>SELECT</codeph> list
+          must equal the number of columns in the column permutation plus the 
number of partition key columns not
+          assigned a constant value. This form is known as <q>dynamic 
partitioning</q>.
+        </p>
+        <p>
+          See <xref href="impala_partitioning.xml#partition_static_dynamic"/> 
for examples and performance
+          characteristics of static and dynamic partitioned inserts.
+        </p>
+      </li>
+
+      <li rev="1.2.2">
+        An optional hint clause immediately before the <codeph>SELECT</codeph> 
keyword, to fine-tune the behavior
+        when doing an <codeph>INSERT ... SELECT</codeph> operation into 
partitioned Parquet tables. The hint
+        keywords are <codeph>[SHUFFLE]</codeph> and 
<codeph>[NOSHUFFLE]</codeph>, including the square brackets.
+        Inserting into partitioned Parquet tables can be a resource-intensive 
operation because it potentially
+        involves many files being written to HDFS simultaneously, and separate
+        <ph rev="parquet_block_size">large</ph> memory buffers being allocated 
to buffer the data for each
+        partition. For usage details, see <xref 
href="impala_parquet.xml#parquet_etl"/>.
+      </li>
+    </ul>
+
+    <note>
+      <ul>
+        <li>
+          Insert commands that partition or add files result in changes to 
Hive metadata. Because Impala uses Hive
+          metadata, such changes may necessitate a metadata refresh. For more 
information, see the
+          <xref href="impala_refresh.xml#refresh" format="dita">REFRESH</xref> 
function.
+        </li>
+
+        <li>
+          Currently, Impala can only insert data into tables that use the text 
and Parquet formats. For other file
+          formats, insert the data using Hive and use Impala to query it.
+        </li>
+
+        <li>
+          As an alternative to the <codeph>INSERT</codeph> statement, if you 
have existing data files elsewhere in
+          HDFS, the <codeph>LOAD DATA</codeph> statement can move those files 
into a table. This statement works
+          with tables of any file format.
+        </li>
+      </ul>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/dml_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      When you insert the results of an expression, particularly of a built-in 
function call, into a small numeric
+      column such as <codeph>INT</codeph>, <codeph>SMALLINT</codeph>, 
<codeph>TINYINT</codeph>, or
+      <codeph>FLOAT</codeph>, you might need to use a <codeph>CAST()</codeph> 
expression to coerce values into the
+      appropriate type. Impala does not automatically convert from a larger 
type to a smaller one. For example, to
+      insert cosine values into a <codeph>FLOAT</codeph> column, write 
<codeph>CAST(COS(angle) AS FLOAT)</codeph>
+      in the <codeph>INSERT</codeph> statement to make the conversion explicit.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/insert_parquet_blocksize"/>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example sets up new tables with the same definition as the 
<codeph>TAB1</codeph> table from the
+      <xref href="impala_tutorial.xml#tutorial" format="dita">Tutorial</xref> 
section, using different file
+      formats, and demonstrates inserting data into the tables created with 
the <codeph>STORED AS TEXTFILE</codeph>
+      and <codeph>STORED AS PARQUET</codeph> clauses:
+    </p>
+
+<codeblock>CREATE DATABASE IF NOT EXISTS file_formats;
+USE file_formats;
+
+DROP TABLE IF EXISTS text_table;
+CREATE TABLE text_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS TEXTFILE;
+
+DROP TABLE IF EXISTS parquet_table;
+CREATE TABLE parquet_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS PARQUET;</codeblock>
+
+    <p>
+      With the <codeph>INSERT INTO TABLE</codeph> syntax, each new set of 
inserted rows is appended to any existing
+      data in the table. This is how you would record small amounts of data 
that arrive continuously, or ingest new
+      batches of data alongside the existing data. For example, after running 
2 <codeph>INSERT INTO TABLE</codeph>
+      statements with 5 rows each, the table contains 10 rows total:
+    </p>
+
+<codeblock>[localhost:21000] &gt; insert into table text_table select * from 
default.tab1;
+Inserted 5 rows in 0.41s
+
+[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.46s
+
+[localhost:21000] &gt; select count(*) from text_table;
++----------+
+| count(*) |
++----------+
+| 10       |
++----------+
+Returned 1 row(s) in 0.26s</codeblock>
+
+    <p>
+      With the <codeph>INSERT OVERWRITE TABLE</codeph> syntax, each new set of 
inserted rows replaces any existing
+      data in the table. This is how you load data to query in a data 
warehousing scenario where you analyze just
+      the data for a particular day, quarter, and so on, discarding the 
previous data each time. You might keep the
+      entire set of data in one raw table, and transfer and transform certain 
rows into a more compact and
+      efficient form to perform intensive analysis on that subset.
+    </p>
+
+    <p>
+      For example, here we insert 5 rows into a table using the <codeph>INSERT 
INTO</codeph> clause, then replace
+      the data by inserting 3 rows with the <codeph>INSERT OVERWRITE</codeph> 
clause. Afterward, the table only
+      contains the 3 rows from the final <codeph>INSERT</codeph> statement.
+    </p>
+
+<codeblock>[localhost:21000] &gt; insert into table parquet_table select * 
from default.tab1;
+Inserted 5 rows in 0.35s
+
+[localhost:21000] &gt; insert overwrite table parquet_table select * from 
default.tab1 limit 3;
+Inserted 3 rows in 0.43s
+[localhost:21000] &gt; select count(*) from parquet_table;
++----------+
+| count(*) |
++----------+
+| 3        |
++----------+
+Returned 1 row(s) in 0.43s</codeblock>
+
+    <p>
+      The <codeph><xref href="impala_insert.xml#values">VALUES</xref></codeph> 
clause lets you insert one or more
+      rows by specifying constant values for all the columns. The number, 
types, and order of the expressions must
+      match the table definition.
+    </p>
+
+    <note id="insert_values_warning">
+      The <codeph>INSERT ... VALUES</codeph> technique is not suitable for 
loading large quantities of data into
+      HDFS-based tables, because the insert operations cannot be parallelized, 
and each one produces a separate
+      data file. Use it for setting up small dimension tables or tiny amounts 
of data for experimenting with SQL
+      syntax, or with HBase tables. Do not use it for large ETL jobs or 
benchmark tests for load operations. Do not
+      run scripts with thousands of <codeph>INSERT ... VALUES</codeph> 
statements that insert a single row each
+      time. If you do run <codeph>INSERT ... VALUES</codeph> operations to 
load data into a staging table as one
+      stage in an ETL pipeline, include multiple row values if possible within 
each <codeph>VALUES</codeph> clause,
+      and use a separate database to make cleanup easier if the operation does 
produce many tiny files.
+    </note>
+
+    <p>
+      The following example shows how to insert one row or multiple rows, with 
expressions of different types,
+      using literal values, expressions, and function return values:
+    </p>
+
+<codeblock>create table val_test_1 (c1 int, c2 float, c3 string, c4 boolean, 
c5 timestamp);
+insert into val_test_1 values (100, 99.9/10, 'abc', true, now());
+create table val_test_2 (id int, token string);
+insert overwrite val_test_2 values (1, 'a'), (2, 'b'), 
(-1,'xyzzy');</codeblock>
+
+    <p>
+      These examples show the type of <q>not implemented</q> error that you 
see when attempting to insert data into
+      a table with a file format that Impala currently does not write to:
+    </p>
+
+<codeblock>DROP TABLE IF EXISTS sequence_table;
+CREATE TABLE sequence_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS SEQUENCEFILE;
+
+DROP TABLE IF EXISTS rc_table;
+CREATE TABLE rc_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS RCFILE;
+
+[localhost:21000] &gt; insert into table rc_table select * from default.tab1;
+Remote error
+Backend 0:RC_FILE not implemented.
+
+[localhost:21000] &gt; insert into table sequence_table select * from 
default.tab1;
+Remote error
+Backend 0:SEQUENCE_FILE not implemented. </codeblock>
+
+    <p>
+      Inserting data into partitioned tables requires slightly different 
syntax that divides the partitioning
+      columns from the others:
+    </p>
+
+<codeblock>create table t1 (i int) <b>partitioned by (x int, y string)</b>;
+-- Select an INT column from another table.
+-- All inserted rows will have the same x and y values, as specified in the 
INSERT statement.
+-- This technique of specifying all the partition key values is known as 
static partitioning.
+insert into t1 <b>partition(x=10, y='a')</b> select c1 from some_other_table;
+-- Select two INT columns from another table.
+-- All inserted rows will have the same y value, as specified in the INSERT 
statement.
+-- Values from c2 go into t1.x.
+-- Any partitioning columns whose value is not specified are filled in
+-- from the columns specified last in the SELECT list.
+-- This technique of omitting some partition key values is known as dynamic 
partitioning.
+insert into t1 <b>partition(x, y='b')</b> select c1, c2 from some_other_table;
+-- Select an INT and a STRING column from another table.
+-- All inserted rows will have the same x value, as specified in the INSERT 
statement.
+-- Values from c3 go into t1.y.
+insert into t1 <b>partition(x=20, y)</b> select c1, c3  from 
some_other_table;</codeblock>
+
+    <p rev="1.1">
+      The following examples show how you can copy the data in all the columns 
from one table to another, copy the
+      data from only some columns, or specify the columns in the select list 
in a different order than they
+      actually appear in the table:
+    </p>
+
+<codeblock>-- Start with 2 identical tables.
+create table t1 (c1 int, c2 int);
+create table t2 like t1;
+
+-- If there is no () part after the destination table name,
+-- all columns must be specified, either as * or by name.
+insert into t2 select * from t1;
+insert into t2 select c1, c2 from t1;
+
+-- With the () notation following the destination table name,
+-- you can omit columns (all values for that column are NULL
+-- in the destination table), and/or reorder the values
+-- selected from the source table. This is the "column permutation" feature.
+insert into t2 (c1) select c1 from t1;
+insert into t2 (c2, c1) select c1, c2 from t1;
+
+-- The column names can be entirely different in the source and destination 
tables.
+-- You can copy any columns, not just the corresponding ones, from the source 
table.
+-- But the number and type of selected columns must match the columns 
mentioned in the () part.
+alter table t2 replace columns (x int, y int);
+insert into t2 (y) select c1 from t1;
+
+-- For partitioned tables, all the partitioning columns must be mentioned in 
the () column list
+-- or a PARTITION clause; these columns cannot be defaulted to NULL.
+create table pt1 (x int, y int) partitioned by (z int);
+-- The values from c1 are copied into the column x in the new table,
+-- all in the same partition based on a constant value for z.
+-- The values of y in the new table are all NULL.
+insert into pt1 (x) partition (z=5) select c1 from t1;
+-- Again we omit the values for column y so they are all NULL.
+-- The inserted x values can go into different partitions, based on
+-- the different values inserted into the partitioning column z.
+insert into pt1 (x,z) select x, z from t2;
+</codeblock>
+
+    <p>
+      <codeph>SELECT *</codeph> for a partitioned table requires that all 
partition key columns in the source table
+      be declared as the last columns in the <codeph>CREATE TABLE</codeph> 
statement. You still include a
+      <codeph>PARTITION BY</codeph> clause listing all the partition key 
columns. These partition columns are
+      automatically mapped to the last columns from the <codeph>SELECT 
*</codeph> list.
+    </p>
+
+<codeblock>create table source (x int, y int, year int, month int, day int);
+create table destination (x int, y int) partitioned by (year int, month int, 
day int);
+...load some data into the unpartitioned source table...
+-- Insert a single partition of data.
+-- The SELECT * means you cannot specify partition (year=2014, month, day).
+insert overwrite destination partition (year, month, day) select * from source 
where year=2014;
+-- Insert the data for all year/month/day combinations.
+insert overwrite destination partition (year, month, day) select * from source;
+
+-- If one of the partition columns is omitted from the source table,
+-- then you can specify a specific value for that column in the PARTITION 
clause.
+-- Here the source table holds only data from 2014, and so does not include a 
year column.
+create table source_2014 (x int, y int, month, day);
+...load some data into the unpartitioned source_2014 table...
+insert overwrite destination partition (year=2014, month, day) select * from 
source_2014;
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+    <p>
+      <b>Concurrency considerations:</b> Each <codeph>INSERT</codeph> 
operation creates new data files with unique
+      names, so you can run multiple <codeph>INSERT INTO</codeph> statements 
simultaneously without filename
+      conflicts.
+<!--
+If data is inserted into a table by a statement issued to a different
+<cmdname>impalad</cmdname> node,
+issue a <codeph>REFRESH <varname>table_name</varname></codeph>
+statement to make the node you are connected to aware of this new data.
+-->
+      While data is being inserted into an Impala table, the data is staged 
temporarily in a subdirectory inside
+      the data directory; during this period, you cannot issue queries against 
that table in Hive. If an
+      <codeph>INSERT</codeph> operation fails, the temporary data file and the 
subdirectory could be left behind in
+      the data directory. If so, remove the relevant subdirectory and any data 
files it contains manually, by
+      issuing an <codeph>hdfs dfs -rm -r</codeph> command, specifying the full 
path of the work subdirectory, whose
+      name ends in <codeph>_dir</codeph>.
+    </p>
+  </conbody>
+
+  <concept id="values">
+
+    <title>VALUES Clause</title>
+
+    <conbody>
+
+      <p>
+        The <codeph>VALUES</codeph> clause is a general-purpose way to specify 
the columns of one or more rows,
+        typically within an <codeph><xref 
href="impala_insert.xml#insert">INSERT</xref></codeph> statement.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/insert_values_warning">
+        <p/>
+      </note>
+
+      <p>
+        The following examples illustrate:
+      </p>
+
+      <ul>
+        <li>
+          How to insert a single row using a <codeph>VALUES</codeph> clause.
+        </li>
+
+        <li>
+          How to insert multiple rows using a <codeph>VALUES</codeph> clause.
+        </li>
+
+        <li>
+          How the row or rows from a <codeph>VALUES</codeph> clause can be 
appended to a table through
+          <codeph>INSERT INTO</codeph>, or replace the contents of the table 
through <codeph>INSERT
+          OVERWRITE</codeph>.
+        </li>
+
+        <li>
+          How the entries in a <codeph>VALUES</codeph> clause can be literals, 
function results, or any other kind
+          of expression. See <xref href="impala_literals.xml#literals"/> for 
the notation to use for literal
+          values, especially <xref 
href="impala_literals.xml#string_literals"/> for quoting and escaping
+          conventions for strings. See <xref 
href="impala_operators.xml#operators"/> and
+          <xref href="impala_functions.xml#builtins"/> for other things you 
can include in expressions with the
+          <codeph>VALUES</codeph> clause.
+        </li>
+      </ul>
+
+<codeblock>[localhost:21000] &gt; describe val_example;
+Query: describe val_example
+Query finished, fetching results ...
++-------+---------+---------+
+| name  | type    | comment |
++-------+---------+---------+
+| id    | int     |         |
+| col_1 | boolean |         |
+| col_2 | double  |         |
++-------+---------+---------+
+
+[localhost:21000] &gt; insert into val_example values (1,true,100.0);
+Inserted 1 rows in 0.30s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------+
+| id | col_1 | col_2 |
++----+-------+-------+
+| 1  | true  | 100   |
++----+-------+-------+
+
+[localhost:21000] &gt; insert overwrite val_example values 
(10,false,pow(2,5)), (50,true,10/3);
+Inserted 2 rows in 0.16s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------------------+
+| id | col_1 | col_2             |
++----+-------+-------------------+
+| 10 | false | 32                |
+| 50 | true  | 3.333333333333333 |
++----+-------+-------------------+</codeblock>
+
+      <p>
+        When used in an <codeph>INSERT</codeph> statement, the Impala 
<codeph>VALUES</codeph> clause can specify
+        some or all of the columns in the destination table, and the columns 
can be specified in a different order
+        than they actually appear in the table. To specify a different set or 
order of columns than in the table,
+        use the syntax:
+      </p>
+
+<codeblock>INSERT INTO <varname>destination</varname>
+  (<varname>col_x</varname>, <varname>col_y</varname>, 
<varname>col_z</varname>)
+  VALUES
+  (<varname>val_x</varname>, <varname>val_y</varname>, 
<varname>val_z</varname>);
+</codeblock>
+
+      <p>
+        Any columns in the table that are not listed in the 
<codeph>INSERT</codeph> statement are set to
+        <codeph>NULL</codeph>.
+      </p>
+
+<!--
+      <p>
+        does not support specifying a subset of the
+        columns in the table or specifying the columns in a different order. 
Use a
+        <codeph>VALUES</codeph> clause with all the column values in the same 
order as
+        the table definition, using <codeph>NULL</codeph> values for any 
columns you
+        want to omit from the <codeph>INSERT</codeph> operation.
+      </p>
+-->
+
+      <p>
+        To use a <codeph>VALUES</codeph> clause like a table in other 
statements, wrap it in parentheses and use
+        <codeph>AS</codeph> clauses to specify aliases for the entire object 
and any columns you need to refer to:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select * from (values(4,5,6),(7,8,9)) as t;
++---+---+---+
+| 4 | 5 | 6 |
++---+---+---+
+| 4 | 5 | 6 |
+| 7 | 8 | 9 |
++---+---+---+
+[localhost:21000] &gt; select * from (values(1 as c1, true as c2, 'abc' as 
c3),(100,false,'xyz')) as t;
++-----+-------+-----+
+| c1  | c2    | c3  |
++-----+-------+-----+
+| 1   | true  | abc |
+| 100 | false | xyz |
++-----+-------+-----+</codeblock>
+
+      <p>
+        For example, you might use a tiny table constructed like this from 
constant literals or function return
+        values as part of a longer statement involving joins or <codeph>UNION 
ALL</codeph>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+      <p>
+        Impala physically writes all inserted files under the ownership of its 
default user, typically
+        <codeph>impala</codeph>. Therefore, this user must have HDFS write 
permission in the corresponding table
+        directory.
+      </p>
+
+      <p>
+        The permission requirement is independent of the authorization 
performed by the Sentry framework. (If the
+        connected user is not authorized to insert into a table, Sentry blocks 
that operation immediately,
+        regardless of the privileges available to the <codeph>impala</codeph> 
user.) Files created by Impala are
+        not owned by and do not inherit permissions from the connected user.
+      </p>
+
+      <p>
+        The number of data files produced by an <codeph>INSERT</codeph> 
statement depends on the size of the
+        cluster, the number of data blocks that are processed, the partition 
key columns in a partitioned table,
+        and the mechanism Impala uses for dividing the work in parallel. Do 
not assume that an
+        <codeph>INSERT</codeph> statement will produce some particular number 
of output files. In case of
+        performance issues with data written by Impala, check that the output 
files do not suffer from issues such
+        as many tiny files or many tiny partitions. (In the Hadoop context, 
even files or partitions of a few tens
+        of megabytes are considered <q>tiny</q>.)
+      </p>
+
+      <p 
conref="../shared/impala_common.xml#common/insert_hidden_work_directory"/>
+
+      <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+
+      <p>
+        You can use the <codeph>INSERT</codeph> statement with HBase tables as 
follows:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            You can insert a single row or a small set of rows into an HBase 
table with the <codeph>INSERT ...
+            VALUES</codeph> syntax. This is a good use case for HBase tables 
with Impala, because HBase tables are
+            not subject to the same kind of fragmentation from many small 
insert operations as HDFS tables are.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can insert any number of rows at once into an HBase table 
using the <codeph>INSERT ...
+            SELECT</codeph> syntax.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If more than one inserted row has the same value for the HBase key 
column, only the last inserted row
+            with that value is visible to Impala queries. You can take 
advantage of this fact with <codeph>INSERT
+            ... VALUES</codeph> statements to effectively update rows one at a 
time, by inserting new rows with the
+            same key values as existing rows. Be aware that after an 
<codeph>INSERT ... SELECT</codeph> operation
+            copying from an HDFS table, the HBase table might contain fewer 
rows than were inserted, if the key
+            column in the source table contained duplicate values.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You cannot <codeph>INSERT OVERWRITE</codeph> into an HBase table. 
New rows are always appended.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you create an Impala or Hive table that maps to an HBase 
table, the column order you specify with
+            the <codeph>INSERT</codeph> statement might be different than the 
order you declare with the
+            <codeph>CREATE TABLE</codeph> statement. Behind the scenes, HBase 
arranges the columns based on how
+            they are divided into column families. This might cause a mismatch 
during insert operations, especially
+            if you use the syntax <codeph>INSERT INTO 
<varname>hbase_table</varname> SELECT * FROM
+            <varname>hdfs_table</varname></codeph>. Before inserting data, 
verify the column order by issuing a
+            <codeph>DESCRIBE</codeph> statement for the table, and adjust the 
order of the select list in the
+            <codeph>INSERT</codeph> statement.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        See <xref href="impala_hbase.xml#impala_hbase"/> for more details 
about using Impala with HBase.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+      <p conref="../shared/impala_common.xml#common/s3_dml"/>
+
+      <p conref="../shared/impala_common.xml#common/security_blurb"/>
+      <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+      <p conref="../shared/impala_common.xml#common/cancel_blurb_yes"/>
+
+      <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+      <p rev="CDH-19187">
+        The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+        typically the <codeph>impala</codeph> user, must have read
+        permission for the files in the source directory of an <codeph>INSERT 
... SELECT</codeph>
+        operation, and write permission for all affected directories in the 
destination table.
+        (An <codeph>INSERT</codeph> operation could write files to multiple 
different HDFS directories
+        if the destination table is partitioned.)
+        This user must also have write permission to create a temporary work 
directory
+        in the top-level HDFS directory of the destination table.
+        An <codeph>INSERT OVERWRITE</codeph> operation does not require write 
permission on
+        the original data files in the table, only on the table directories 
themselves.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/char_varchar_cast_from_string"/>
+
+      <p conref="../shared/impala_common.xml#common/related_options"/>
+
+      <p rev="1.3.1" 
conref="../shared/impala_common.xml#common/insert_inherit_permissions"/>
+    </conbody>
+  </concept>
+
+<!-- Values clause -->
+</concept>
+<!-- INSERT statement -->

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_int.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_int.xml b/docs/topics/impala_int.xml
new file mode 100644
index 0000000..514d377
--- /dev/null
+++ b/docs/topics/impala_int.xml
@@ -0,0 +1,95 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="int">
+
+  <title>INT Data Type</title>
+  <titlealts><navtitle>INT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      A 4-byte integer data type used in <codeph>CREATE TABLE</codeph> and 
<codeph>ALTER TABLE</codeph> statements.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+    </p>
+
+<codeblock><varname>column_name</varname> INT</codeblock>
+
+    <p>
+      <b>Range:</b> -2147483648 .. 2147483647. There is no 
<codeph>UNSIGNED</codeph> subtype.
+    </p>
+
+    <p>
+      <b>Conversions:</b> Impala automatically converts to a larger integer 
type (<codeph>BIGINT</codeph>) or a
+      floating-point type (<codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>) 
automatically. Use
+      <codeph>CAST()</codeph> to convert to <codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>,
+      <codeph>STRING</codeph>, or <codeph>TIMESTAMP</codeph>.
+      <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      The data type <codeph>INTEGER</codeph> is an alias for 
<codeph>INT</codeph>.
+    </p>
+
+    <p>
+      For a convenient and automated way to check the bounds of the 
<codeph>INT</codeph> type, call the functions
+      <codeph>MIN_INT()</codeph> and <codeph>MAX_INT()</codeph>.
+    </p>
+
+    <p>
+      If an integer value is too large to be represented as a 
<codeph>INT</codeph>, use a <codeph>BIGINT</codeph>
+      instead.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>CREATE TABLE t1 (x INT);
+SELECT CAST(1000 AS INT);
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/partitioning_good"/>
+
+    <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/parquet_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!-- <p 
conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/internals_4_bytes"/>
+
+    <p conref="../shared/impala_common.xml#common/added_forever"/>
+
+    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+<!-- <p 
conref="/Content/impala_common_xi44078.xml#common/restrictions_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_literals.xml#numeric_literals"/>, <xref 
href="impala_tinyint.xml#tinyint"/>,
+      <xref href="impala_smallint.xml#smallint"/>, <xref 
href="impala_int.xml#int"/>,
+      <xref href="impala_bigint.xml#bigint"/>, <xref 
href="impala_decimal.xml#decimal"/>,
+      <xref href="impala_math_functions.xml#math_functions"/>
+    </p>
+  </conbody>
+</concept>

[11/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

Reply via email to