[10/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

jrussell Tue, 26 Jul 2016 16:06:07 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_invalidate_metadata.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_invalidate_metadata.xml 
b/docs/topics/impala_invalidate_metadata.xml
new file mode 100644
index 0000000..96fca7d
--- /dev/null
+++ b/docs/topics/impala_invalidate_metadata.xml
@@ -0,0 +1,236 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="invalidate_metadata">
+
+  <title>INVALIDATE METADATA Statement</title>
+  <titlealts><navtitle>INVALIDATE METADATA</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Metastore"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Tables"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">INVALIDATE METADATA statement</indexterm>
+      Marks the metadata for one or all tables as stale. Required after a 
table is created through the Hive shell,
+      before the table is available for Impala queries. The next time the 
current Impala node performs a query
+      against a table whose metadata is invalidated, Impala reloads the 
associated metadata before the query
+      proceeds. This is a relatively expensive operation compared to the 
incremental metadata update done by the
+      <codeph>REFRESH</codeph> statement, so in the common scenario of adding 
new data files to an existing table,
+      prefer <codeph>REFRESH</codeph> rather than <codeph>INVALIDATE 
METADATA</codeph>. If you are not familiar
+      with the way Impala uses metadata and how it shares the same metastore 
database as Hive, see
+      <xref href="impala_hadoop.xml#intro_metastore"/> for background 
information.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>INVALIDATE METADATA 
[[<varname>db_name</varname>.]<varname>table_name</varname>]</codeblock>
+
+    <p>
+      By default, the cached metadata for all tables is flushed. If you 
specify a table name, only the metadata for
+      that one table is flushed. Even for a single table, <codeph>INVALIDATE 
METADATA</codeph> is more expensive
+      than <codeph>REFRESH</codeph>, so prefer <codeph>REFRESH</codeph> in the 
common case where you add new data
+      files for an existing table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/internals_blurb"/>
+
+    <p>
+      To accurately respond to queries, Impala must have current metadata 
about those databases and tables that
+      clients query directly. Therefore, if some other entity modifies 
information used by Impala in the metastore
+      that Impala and Hive share, the information cached by Impala must be 
updated. However, this does not mean
+      that all metadata updates require an Impala update.
+    </p>
+
+    <note>
+      <p conref="../shared/impala_common.xml#common/catalog_server_124"/>
+      <p rev="1.2">
+        In Impala 1.2 and higher, a dedicated daemon 
(<cmdname>catalogd</cmdname>) broadcasts DDL changes made
+        through Impala to all Impala nodes. Formerly, after you created a 
database or table while connected to one
+        Impala node, you needed to issue an <codeph>INVALIDATE 
METADATA</codeph> statement on another Impala node
+        before accessing the new database or table from the other node. Now, 
newly created or altered objects are
+        picked up automatically by all Impala nodes. You must still use the 
<codeph>INVALIDATE METADATA</codeph>
+        technique after creating or altering objects through Hive. See
+        <xref href="impala_components.xml#intro_catalogd"/> for more 
information on the catalog service.
+      </p>
+      <p>
+        The <codeph>INVALIDATE METADATA</codeph> statement is new in Impala 
1.1 and higher, and takes over some of
+        the use cases of the Impala 1.0 <codeph>REFRESH</codeph> statement. 
Because <codeph>REFRESH</codeph> now
+        requires a table name parameter, to flush the metadata for all tables 
at once, use the <codeph>INVALIDATE
+        METADATA</codeph> statement.
+      </p>
+      <draft-comment translate="no"> Almost-identical wording here, under 
INVALIDATE METADATA, and in Release Notes :: New Features. Makes sense to 
conref. </draft-comment>
+      <p>
+        Because <codeph>REFRESH <varname>table_name</varname></codeph> only 
works for tables that the current
+        Impala node is already aware of, when you create a new table in the 
Hive shell, you must enter
+        <codeph>INVALIDATE METADATA</codeph> with no table parameter before 
you can see the new table in
+        <cmdname>impala-shell</cmdname>. Once the table is known by the Impala 
node, you can issue <codeph>REFRESH
+        <varname>table_name</varname></codeph> after you add data files for 
that table.
+      </p>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      A metadata update for an <codeph>impalad</codeph> instance <b>is</b> 
required if:
+    </p>
+
+    <ul>
+      <li>
+        A metadata change occurs.
+      </li>
+
+      <li>
+        <b>and</b> the change is made from another <codeph>impalad</codeph> 
instance in your cluster, or through
+        Hive.
+      </li>
+
+      <li>
+        <b>and</b> the change is made to a database to which clients such as 
the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p>
+      A metadata update for an Impala node is <b>not</b> required when you 
issue queries from the same Impala node
+      where you ran <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, or 
other table-modifying statement.
+    </p>
+
+    <p>
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul>
+      <li>
+        Hive - via <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, 
<codeph>DROP</codeph> or
+        <codeph>INSERT</codeph> operations.
+      </li>
+
+      <li>
+        Impalad - via <codeph>CREATE TABLE</codeph>, <codeph>ALTER 
TABLE</codeph>, and <codeph>INSERT</codeph>
+        operations.
+      </li>
+    </ul>
+
+    <p>
+      <codeph>INVALIDATE METADATA</codeph> causes the metadata for that table 
to be marked as stale, and reloaded
+      the next time the table is referenced. For a huge table, that process 
could take a noticeable amount of time;
+      thus you might prefer to use <codeph>REFRESH</codeph> where practical, 
to avoid an unpredictable delay later,
+      for example if the next reference to the table is during a benchmark 
test.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how you might use the <codeph>INVALIDATE 
METADATA</codeph> statement after
+      creating new tables (such as SequenceFile or HBase tables) through the 
Hive shell. Before the
+      <codeph>INVALIDATE METADATA</codeph> statement was issued, Impala would 
give a <q>table not found</q> error
+      if you tried to refer to those table names. The 
<codeph>DESCRIBE</codeph> statements cause the latest
+      metadata to be immediately loaded for the tables, avoiding a delay the 
next time those tables are queried.
+    </p>
+
+<codeblock>[impalad-host:21000] &gt; invalidate metadata;
+[impalad-host:21000] &gt; describe t1;
+...
+[impalad-host:21000] &gt; describe t2;
+... </codeblock>
+
+    <p>
+      For more examples of using <codeph>REFRESH</codeph> and 
<codeph>INVALIDATE METADATA</codeph> with a
+      combination of Impala and Hive operations, see <xref 
href="impala_tutorial.xml#tutorial_impala_hive"/>.
+    </p>
+
+    <p>
+      If you need to ensure that the metadata is up-to-date when you start an 
<cmdname>impala-shell</cmdname>
+      session, run <cmdname>impala-shell</cmdname> with the 
<codeph>-r</codeph> or
+      <codeph>--refresh_after_connect</codeph> command-line option. Because 
this operation adds a delay to the next
+      query against each table, potentially expensive for large tables with 
many partitions, try to avoid using
+      this option for day-to-day operations in a production environment.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <codeph>LOCATION</codeph> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this 
statement,
+      but subsequent statements such as <codeph>SELECT</codeph>
+      or <codeph>SHOW TABLE STATS</codeph> could fail.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      By default, the <codeph>INVALIDATE METADATA</codeph> command checks HDFS 
permissions of the underlying data
+      files and directories, caching this information so that a statement can 
be cancelled immediately if for
+      example the <codeph>impala</codeph> user does not have permission to 
write to the data directory for the
+      table. (This checking does not apply if you have set the 
<cmdname>catalogd</cmdname> configuration option
+      <codeph>--load_catalog_in_background=false</codeph>.) Impala reports any 
lack of write permissions as an
+      <codeph>INFO</codeph> message in the log file, in case that represents 
an oversight. If you change HDFS
+      permissions to make data readable or writeable by the Impala user, issue 
another <codeph>INVALIDATE
+      METADATA</codeph> to make Impala aware of the change.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p rev="1.2.4">
+      This example illustrates creating a new database and new table in Hive, 
then doing an <codeph>INVALIDATE
+      METADATA</codeph> statement in Impala using the fully qualified table 
name, after which both the new table
+      and the new database are visible to Impala. The ability to specify 
<codeph>INVALIDATE METADATA
+      <varname>table_name</varname></codeph> for a table created in Hive is a 
new capability in Impala 1.2.4. In
+      earlier releases, that statement would have returned an error indicating 
an unknown table, requiring you to
+      do <codeph>INVALIDATE METADATA</codeph> with no table name, a more 
expensive operation that reloaded metadata
+      for all tables and databases.
+    </p>
+
+<codeblock rev="1.2.4">$ hive
+hive&gt; create database new_db_from_hive;
+OK
+Time taken: 4.118 seconds
+hive&gt; create table new_db_from_hive.new_table_from_hive (x int);
+OK
+Time taken: 0.618 seconds
+hive&gt; quit;
+$ impala-shell
+[localhost:21000] &gt; show databases like 'new*';
+[localhost:21000] &gt; refresh new_db_from_hive.new_table_from_hive;
+ERROR: AnalysisException: Database does not exist: new_db_from_hive
+[localhost:21000] &gt; invalidate metadata 
new_db_from_hive.new_table_from_hive;
+[localhost:21000] &gt; show databases like 'new*';
++--------------------+
+| name               |
++--------------------+
+| new_db_from_hive   |
++--------------------+
+[localhost:21000] &gt; show tables in new_db_from_hive;
++---------------------+
+| name                |
++---------------------+
+| new_table_from_hive |
++---------------------+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p conref="../shared/impala_common.xml#common/s3_metadata"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hadoop.xml#intro_metastore"/>,
+      <xref href="impala_refresh.xml#refresh"/>
+    </p>
+
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_joins.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_joins.xml b/docs/topics/impala_joins.xml
new file mode 100644
index 0000000..011a488
--- /dev/null
+++ b/docs/topics/impala_joins.xml
@@ -0,0 +1,520 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="joins">
+
+  <title>Joins in Impala SELECT Statements</title>
+  <titlealts><navtitle>Joins</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">joins</indexterm>
+      A join query is a <codeph>SELECT</codeph> statement that combines data 
from two or more tables,
+      and returns a result set containing items from some or all of those 
tables. It is a way to
+      cross-reference and correlate related data that is organized into 
multiple tables, typically
+      using identifiers that are repeated in each of the joined tables.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/join_types"/>
+
+<codeblock>SELECT <varname>select_list</varname> FROM
+  <varname>table_or_subquery1</varname> [INNER] JOIN 
<varname>table_or_subquery2</varname> |
+  <varname>table_or_subquery1</varname> {LEFT [OUTER] | RIGHT [OUTER] | FULL 
[OUTER]} JOIN <varname>table_or_subquery2</varname> |
+  <varname>table_or_subquery1</varname> {LEFT | RIGHT} SEMI JOIN 
<varname>table_or_subquery2</varname> |
+  <ph rev="2.0.0"><varname>table_or_subquery1</varname> {LEFT | RIGHT} ANTI 
JOIN <varname>table_or_subquery2</varname> |</ph>
+    [ ON <varname>col1</varname> = <varname>col2</varname> [AND 
<varname>col3</varname> = <varname>col4</varname> ...] |
+      USING (<varname>col1</varname> [, <varname>col2</varname> ...]) ]
+  [<varname>other_join_clause</varname> ...]
+[ WHERE <varname>where_clauses</varname> ]
+
+SELECT <varname>select_list</varname> FROM
+  <varname>table_or_subquery1</varname>, <varname>table_or_subquery2</varname> 
[, <varname>table_or_subquery3</varname> ...]
+  [<varname>other_join_clause</varname> ...]
+WHERE
+    <varname>col1</varname> = <varname>col2</varname> [AND 
<varname>col3</varname> = <varname>col4</varname> ...]
+
+SELECT <varname>select_list</varname> FROM
+  <varname>table_or_subquery1</varname> CROSS JOIN 
<varname>table_or_subquery2</varname>
+  [<varname>other_join_clause</varname> ...]
+[ WHERE <varname>where_clauses</varname> ]</codeblock>
+
+    <p>
+      <b>SQL-92 and SQL-89 Joins:</b>
+    </p>
+
+    <p>
+      Queries with the explicit <codeph>JOIN</codeph> keywords are known as 
SQL-92 style joins, referring to the
+      level of the SQL standard where they were introduced. The corresponding 
<codeph>ON</codeph> or
+      <codeph>USING</codeph> clauses clearly show which columns are used as 
the join keys in each case:
+    </p>
+
+<codeblock>SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b>
+  <b>ON t1.id = t2.id and t1.type_flag = t2.type_flag</b>
+  WHERE t1.c1 &gt; 100;
+
+SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b>
+  <b>USING (id, type_flag)</b>
+  WHERE t1.c1 &gt; 100;</codeblock>
+
+    <p>
+      The <codeph>ON</codeph> clause is a general way to compare columns 
across the two tables, even if the column
+      names are different. The <codeph>USING</codeph> clause is a shorthand 
notation for specifying the join
+      columns, when the column names are the same in both tables. You can code 
equivalent <codeph>WHERE</codeph>
+      clauses that compare the columns, instead of <codeph>ON</codeph> or 
<codeph>USING</codeph> clauses, but that
+      practice is not recommended because mixing the join comparisons with 
other filtering clauses is typically
+      less readable and harder to maintain.
+    </p>
+
+    <p>
+      Queries with a comma-separated list of tables and subqueries are known 
as SQL-89 style joins. In these
+      queries, the equality comparisons between columns of the joined tables 
go in the <codeph>WHERE</codeph>
+      clause alongside other kinds of comparisons. This syntax is easy to 
learn, but it is also easy to
+      accidentally remove a <codeph>WHERE</codeph> clause needed for the join 
to work correctly.
+    </p>
+
+<codeblock>SELECT t1.c1, t2.c2 FROM <b>t1, t2</b>
+  WHERE
+  <b>t1.id = t2.id AND t1.type_flag = t2.type_flag</b>
+  AND t1.c1 &gt; 100;</codeblock>
+
+    <p>
+      <b>Self-joins:</b>
+    </p>
+
+    <p>
+      Impala can do self-joins, for example to join on two different columns 
in the same table to represent
+      parent-child relationships or other tree-structured data. There is no 
explicit syntax for this; just use the
+      same table name for both the left-hand and right-hand table, and assign 
different table aliases to use when
+      referring to the fully qualified column names:
+    </p>
+
+<codeblock>-- Combine fields from both parent and child rows.
+SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs 
WHERE lhs.id = rhs.parent;</codeblock>
+
+    <p>
+      <b>Cartesian joins:</b>
+    </p>
+
+    <p>
+      To avoid producing huge result sets by mistake, Impala does not allow 
Cartesian joins of the form:
+<codeblock>SELECT ... FROM t1 JOIN t2;
+SELECT ... FROM t1, t2;</codeblock>
+      If you intend to join the tables based on common values, add 
<codeph>ON</codeph> or <codeph>WHERE</codeph>
+      clauses to compare columns across the tables. If you truly intend to do 
a Cartesian join, use the
+      <codeph>CROSS JOIN</codeph> keyword as the join operator. The 
<codeph>CROSS JOIN</codeph> form does not use
+      any <codeph>ON</codeph> clause, because it produces a result set with 
all combinations of rows from the
+      left-hand and right-hand tables. The result set can still be filtered by 
subsequent <codeph>WHERE</codeph>
+      clauses. For example:
+    </p>
+
+<codeblock>SELECT ... FROM t1 CROSS JOIN t2;
+SELECT ... FROM t1 CROSS JOIN t2 WHERE 
<varname>tests_on_non_join_columns</varname>;</codeblock>
+
+    <p>
+      <b>Inner and outer joins:</b>
+    </p>
+
+    <p>
+      An inner join is the most common and familiar type: rows in the result 
set contain the requested columns from
+      the appropriate tables, for all combinations of rows where the join 
columns of the tables have identical
+      values. If a column with the same name occurs in both tables, use a 
fully qualified name or a column alias to
+      refer to the column in the select list or other clauses. Impala performs 
inner joins by default for both
+      SQL-89 and SQL-92 join syntax:
+    </p>
+
+<codeblock>-- The following 3 forms are all equivalent.
+SELECT t1.id, c1, c2 FROM t1, t2 WHERE t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 JOIN t2 ON t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;</codeblock>
+
+    <p>
+      An outer join retrieves all rows from the left-hand table, or the 
right-hand table, or both; wherever there
+      is no matching data in the table on the other side of the join, the 
corresponding columns in the result set
+      are set to <codeph>NULL</codeph>. To perform an outer join, include the 
<codeph>OUTER</codeph> keyword in the
+      join operator, along with either <codeph>LEFT</codeph>, 
<codeph>RIGHT</codeph>, or <codeph>FULL</codeph>:
+    </p>
+
+<codeblock>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;</codeblock>
+
+    <p>
+      For outer joins, Impala requires SQL-92 syntax; that is, the 
<codeph>JOIN</codeph> keyword instead of
+      comma-separated table names. Impala does not support vendor extensions 
such as <codeph>(+)</codeph> or
+      <codeph>*=</codeph> notation for doing outer joins with SQL-89 query 
syntax.
+    </p>
+
+    <p>
+      <b>Equijoins and Non-Equijoins:</b>
+    </p>
+
+    <p>
+      By default, Impala requires an equality comparison between the left-hand 
and right-hand tables, either
+      through <codeph>ON</codeph>, <codeph>USING</codeph>, or 
<codeph>WHERE</codeph> clauses. These types of
+      queries are classified broadly as equijoins. Inner, outer, full, and 
semi joins can all be equijoins based on
+      the presence of equality tests between columns in the left-hand and 
right-hand tables.
+    </p>
+
+    <p>
+      In Impala 1.2.2 and higher, non-equijoin queries are also possible, with 
comparisons such as
+      <codeph>!=</codeph> or <codeph>&lt;</codeph> between the join columns. 
These kinds of queries require care to
+      avoid producing huge result sets that could exceed resource limits. Once 
you have planned a non-equijoin
+      query that produces a result set of acceptable size, you can code the 
query using the <codeph>CROSS
+      JOIN</codeph> operator, and add the extra comparisons in the 
<codeph>WHERE</codeph> clause:
+    </p>
+
+<codeblock>SELECT * FROM t1 CROSS JOIN t2 WHERE t1.total &gt; 
t2.maximum_price;</codeblock>
+
+    <p rev="2.3.0">
+      In CDH 5.5 / Impala 2.3 and higher, additional non-equijoin queries are 
possible due to the addition
+      of nested loop joins. These queries typically involve <codeph>SEMI 
JOIN</codeph>,
+      <codeph>ANTI JOIN</codeph>, or <codeph>FULL OUTER JOIN</codeph> clauses.
+      Impala sometimes also uses nested loop joins internally when evaluating 
<codeph>OUTER JOIN</codeph>
+      queries involving complex type columns.
+      Query phases involving nested loop joins do not use the spill-to-disk 
mechanism if they
+      exceed the memory limit. Impala decides internally when to use each join 
mechanism; you cannot
+      specify any query hint to choose between the nested loop join or the 
original hash join algorithm.
+    </p>
+
+<codeblock rev="2.3.0">SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.int_col &lt; 
t2.int_col;</codeblock>
+
+    <p>
+      <b>Semi-joins:</b>
+    </p>
+
+    <p>
+      Semi-joins are a relatively rarely used variation. With the left 
semi-join, only data from the left-hand
+      table is returned, for rows where there is matching data in the 
right-hand table, based on comparisons
+      between join columns in <codeph>ON</codeph> or <codeph>WHERE</codeph> 
clauses. Only one instance of each row
+      from the left-hand table is returned, regardless of how many matching 
rows exist in the right-hand table.
+      <ph rev="2.0.0">A right semi-join (available in Impala 2.0 and higher) 
reverses the comparison and returns
+      data from the right-hand table.</ph>
+    </p>
+
+<codeblock>SELECT t1.c1, t1.c2, t1.c2 FROM t1 LEFT SEMI JOIN t2 ON t1.id = 
t2.id;</codeblock>
+
+    <p>
+      <b>Natural joins (not supported):</b>
+    </p>
+
+    <p>
+      Impala does not support the <codeph>NATURAL JOIN</codeph> operator, 
again to avoid inconsistent or huge
+      result sets. Natural joins do away with the <codeph>ON</codeph> and 
<codeph>USING</codeph> clauses, and
+      instead automatically join on all columns with the same names in the 
left-hand and right-hand tables. This
+      kind of query is not recommended for rapidly evolving data structures 
such as are typically used in Hadoop.
+      Thus, Impala does not support the <codeph>NATURAL JOIN</codeph> syntax, 
which can produce different query
+      results as columns are added to or removed from tables.
+    </p>
+
+    <p>
+      If you do have any queries that use <codeph>NATURAL JOIN</codeph>, make 
sure to rewrite them with explicit
+      <codeph>USING</codeph> clauses, because Impala could interpret the 
<codeph>NATURAL</codeph> keyword as a
+      table alias:
+    </p>
+
+<codeblock>-- 'NATURAL' is interpreted as an alias for 't1' and Impala 
attempts an inner join,
+-- resulting in an error because inner joins require explicit comparisons 
between columns.
+SELECT t1.c1, t2.c2 FROM t1 NATURAL JOIN t2;
+ERROR: NotImplementedException: Join with 't2' requires at least one 
conjunctive equality predicate.
+  To perform a Cartesian product between two tables, use a CROSS JOIN.
+
+-- If you expect the tables to have identically named columns with matching 
values,
+-- list the corresponding column names in a USING clause.
+SELECT t1.c1, t2.c2 FROM t1 JOIN t2 USING (id, type_flag, name, 
address);</codeblock>
+
+    <p rev="2.0.0">
+      <b>Anti-joins (Impala 2.0 / CDH 5.2 and higher only):</b>
+    </p>
+
+    <p rev="2.0.0">
+      Impala supports the <codeph>LEFT ANTI JOIN</codeph> and <codeph>RIGHT 
ANTI JOIN</codeph> clauses in Impala
+      2.0 and higher on CDH 4, or CDH 5.2 and higher on CDH 5. The 
<codeph>LEFT</codeph> or <codeph>RIGHT</codeph>
+      keyword is required for this kind of join. For <codeph>LEFT ANTI 
JOIN</codeph>, this clause returns those
+      values from the left-hand table that have no matching value in the 
right-hand table. <codeph>RIGHT ANTI
+      JOIN</codeph> reverses the comparison and returns values from the 
right-hand table. You can express this
+      negative relationship either through the <codeph>ANTI JOIN</codeph> 
clause or through a <codeph>NOT
+      EXISTS</codeph> operator with a subquery.
+    </p>
+
+<!-- Restriction lifted in Impala 2.0.
+<p>
+Impala does not support <codeph>WHERE</codeph> clauses
+such as <codeph>IN (<varname>subquery</varname>)</codeph>,
+<codeph>NOT IN (<varname>subquery</varname>)</codeph>,
+<codeph>EXISTS (<varname>subquery</varname>)</codeph>,
+and <codeph>NOT EXISTS (<varname>subquery</varname>)</codeph>.
+Therefore from a practical standpoint, you cannot
+express an anti-join condition, where values from one table
+are returned only if no matching values are present in another table.
+</p>
+-->
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+<!-- To do: reuse some complex types examples with joins here or under 
Examples farther down. -->
+
+    <p rev="2.3.0">
+      When referring to a column with a complex type (<codeph>STRUCT</codeph>, 
<codeph>ARRAY</codeph>, or <codeph>MAP</codeph>)
+      in a query, you use join notation to <q>unpack</q> the scalar fields of 
the struct, the elements of the array, or
+      the key-value pairs of the map. (The join notation is not required for 
aggregation operations, such as
+      <codeph>COUNT()</codeph> or <codeph>SUM()</codeph> for array elements.) 
Because Impala recognizes which complex type elements are associated with which 
row
+      of the result set, you use the same syntax as for a cross or cartesian 
join, without an explicit join condition.
+      See <xref href="impala_complex_types.xml#complex_types"/> for details 
about Impala support for complex types.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      You typically use join queries in situations like these:
+    </p>
+
+    <ul>
+      <li>
+        When related data arrives from different sources, with each data set 
physically residing in a separate
+        table. For example, you might have address data from business records 
that you cross-check against phone
+        listings or census data.
+        <note>
+          Impala can join tables of different file formats, including 
Impala-managed tables and HBase tables. For
+          example, you might keep small dimension tables in HBase, for 
convenience of single-row lookups and
+          updates, and for the larger fact tables use Parquet or other binary 
file format optimized for scan
+          operations. Then, you can issue a join query to cross-reference the 
fact tables with the dimension
+          tables.
+        </note>
+      </li>
+
+      <li>
+        When data is normalized, a technique for reducing data duplication by 
dividing it across multiple tables.
+        This kind of organization is often found in data that comes from 
traditional relational database systems.
+        For example, instead of repeating some long string such as a customer 
name in multiple tables, each table
+        might contain a numeric customer ID. Queries that need to display the 
customer name could <q>join</q> the
+        table that specifies which customer ID corresponds to which name.
+      </li>
+
+      <li>
+        When certain columns are rarely needed for queries, so they are moved 
into separate tables to reduce
+        overhead for common queries. For example, a <codeph>biography</codeph> 
field might be rarely needed in
+        queries on employee data. Putting that field in a separate table 
reduces the amount of I/O for common
+        queries on employee addresses or phone numbers. Queries that do need 
the <codeph>biography</codeph> column
+        can retrieve it by performing a join with that separate table.
+      </li>
+
+      <li>
+        In CDH 5.5 / Impala 2.3 or higher, when referring to complex type 
columns in queries.
+        See <xref href="impala_complex_types.xml#complex_types"/> for details.
+      </li>
+    </ul>
+
+    <p>
+      When comparing columns with the same names in <codeph>ON</codeph> or 
<codeph>WHERE</codeph> clauses, use the
+      fully qualified names such as 
<codeph><varname>db_name</varname>.<varname>table_name</varname></codeph>, or
+      assign table aliases, column aliases, or both to make the code more 
compact and understandable:
+    </p>
+
+<codeblock>select t1.c1 as first_id, t2.c2 as second_id from
+  t1 join t2 on first_id = second_id;
+
+select fact.custno, dimension.custno from
+  customer_data as fact join customer_address as dimension
+  using (custno)</codeblock>
+
+    <note>
+      <p>
+        Performance for join queries is a crucial aspect for Impala, because 
complex join queries are
+        resource-intensive operations. An efficient join query produces much 
less network traffic and CPU overhead
+        than an inefficient one. For best results:
+      </p>
+      <ul>
+        <li rev="1.2">
+          Make sure that both <xref 
href="impala_perf_stats.xml#perf_stats">table and column statistics</xref> are
+          available for all the tables involved in a join query, and 
especially for the columns referenced in any
+          join conditions. Impala uses the statistics to automatically deduce 
an efficient join order.
+          Use <xref href="impala_show.xml#show"><codeph>SHOW TABLE STATS 
<varname>table_name</varname></codeph> and
+          <codeph>SHOW COLUMN STATS 
<varname>table_name</varname></codeph></xref> to check if statistics are
+          already present. Issue the <codeph>COMPUTE STATS 
<varname>table_name</varname></codeph> for a nonpartitioned table,
+          or (in Impala 2.1.0 and higher) <codeph>COMPUTE INCREMENTAL STATS 
<varname>table_name</varname></codeph>
+          for a partitioned table, to collect the initial statistics at both 
the table and column levels, and to keep the
+          statistics up to date after any substantial <codeph>INSERT</codeph> 
or <codeph>LOAD DATA</codeph> operations.
+        </li>
+
+        <li rev="1.2">
+          If table or column statistics are not available, join the largest 
table first. You can check the
+          existence of statistics with the <codeph>SHOW TABLE STATS 
<varname>table_name</varname></codeph> and
+          <codeph>SHOW COLUMN STATS <varname>table_name</varname></codeph> 
statements.
+        </li>
+
+        <li rev="1.2.2">
+          If table or column statistics are not available, join subsequent 
tables according to which table has the
+          most selective filter, based on overall size and 
<codeph>WHERE</codeph> clauses. Joining the table with
+          the most selective filter results in the fewest number of rows being 
returned.
+        </li>
+      </ul>
+      <p>
+        For more information and examples of performance for join queries, see
+        <xref href="impala_perf_joins.xml#perf_joins"/>.
+      </p>
+    </note>
+
+    <p>
+      To control the result set from a join query, include the names of 
corresponding column names in both tables
+      in an <codeph>ON</codeph> or <codeph>USING</codeph> clause, or by coding 
equality comparisons for those
+      columns in the <codeph>WHERE</codeph> clause.
+    </p>
+
+<codeblock>[localhost:21000] &gt; select c_last_name, ca_city from customer 
join customer_address where c_customer_sk = ca_address_sk;
++-------------+-----------------+
+| c_last_name | ca_city         |
++-------------+-----------------+
+| Lewis       | Fairfield       |
+| Moses       | Fairview        |
+| Hamilton    | Pleasant Valley |
+| White       | Oak Ridge       |
+| Moran       | Glendale        |
+...
+| Richards    | Lakewood         |
+| Day         | Lebanon          |
+| Painter     | Oak Hill         |
+| Bentley     | Greenfield       |
+| Jones       | Stringtown       |
++-------------+------------------+
+Returned 50000 row(s) in 9.82s</codeblock>
+
+    <p>
+      One potential downside of joins is the possibility of excess resource 
usage in poorly constructed queries.
+      Impala imposes restrictions on join queries to guard against such 
issues. To minimize the chance of runaway
+      queries on large data sets, Impala requires every join query to contain 
at least one equality predicate
+      between the columns of the various tables. For example, if 
<codeph>T1</codeph> contains 1000 rows and
+      <codeph>T2</codeph> contains 1,000,000 rows, a query <codeph>SELECT 
<varname>columns</varname> FROM t1 JOIN
+      t2</codeph> could return up to 1 billion rows (1000 * 1,000,000); Impala 
requires that the query include a
+      clause such as <codeph>ON t1.c1 = t2.c2</codeph> or <codeph>WHERE t1.c1 
= t2.c2</codeph>.
+    </p>
+
+    <p>
+      Because even with equality clauses, the result set can still be large, 
as we saw in the previous example, you
+      might use a <codeph>LIMIT</codeph> clause to return a subset of the 
results:
+    </p>
+
+<codeblock>[localhost:21000] &gt; select c_last_name, ca_city from customer, 
customer_address where c_customer_sk = ca_address_sk limit 10;
++-------------+-----------------+
+| c_last_name | ca_city         |
++-------------+-----------------+
+| Lewis       | Fairfield       |
+| Moses       | Fairview        |
+| Hamilton    | Pleasant Valley |
+| White       | Oak Ridge       |
+| Moran       | Glendale        |
+| Sharp       | Lakeview        |
+| Wiles       | Farmington      |
+| Shipman     | Union           |
+| Gilbert     | New Hope        |
+| Brunson     | Martinsville    |
++-------------+-----------------+
+Returned 10 row(s) in 0.63s</codeblock>
+
+    <p>
+      Or you might use additional comparison operators or aggregation 
functions to condense a large result set into
+      a smaller set of values:
+    </p>
+
+<codeblock>[localhost:21000] &gt; -- Find the names of customers who live in 
one particular town.
+[localhost:21000] &gt; select distinct c_last_name from customer, 
customer_address where
+  c_customer_sk = ca_address_sk
+  and ca_city = "Green Acres";
++---------------+
+| c_last_name   |
++---------------+
+| Hensley       |
+| Pearson       |
+| Mayer         |
+| Montgomery    |
+| Ricks         |
+...
+| Barrett       |
+| Price         |
+| Hill          |
+| Hansen        |
+| Meeks         |
++---------------+
+Returned 332 row(s) in 0.97s
+
+[localhost:21000] &gt; -- See how many different customers in this town have 
names starting with "A".
+[localhost:21000] &gt; select count(distinct c_last_name) from customer, 
customer_address where
+  c_customer_sk = ca_address_sk
+  and ca_city = "Green Acres"
+  and substr(c_last_name,1,1) = "A";
++-----------------------------+
+| count(distinct c_last_name) |
++-----------------------------+
+| 12                          |
++-----------------------------+
+Returned 1 row(s) in 1.00s</codeblock>
+
+    <p>
+      Because a join query can involve reading large amounts of data from 
disk, sending large amounts of data
+      across the network, and loading large amounts of data into memory to do 
the comparisons and filtering, you
+      might do benchmarking, performance analysis, and query tuning to find 
the most efficient join queries for
+      your data set, hardware capacity, network configuration, and cluster 
workload.
+    </p>
+
+    <p>
+      The two categories of joins in Impala are known as <b>partitioned 
joins</b> and <b>broadcast joins</b>. If
+      inaccurate table or column statistics, or some quirk of the data 
distribution, causes Impala to choose the
+      wrong mechanism for a particular join, consider using query hints as a 
temporary workaround. For details, see
+      <xref href="impala_hints.xml#hints"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following examples refer to these simple tables containing small 
sets of integers:
+<codeblock>[localhost:21000] &gt; create table t1 (x int);
+[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] &gt; create table t2 (y int);
+[localhost:21000] &gt; insert into t2 values (2), (4), (6);
+
+[localhost:21000] &gt; create table t3 (z int);
+[localhost:21000] &gt; insert into t3 values (1), (3), (5);
+</codeblock>
+    </p>
+
+<!-- To do: fill in examples for other join types. -->
+
+    <p>
+      The following example demonstrates an anti-join, returning the values 
from <codeph>T1</codeph> that do not
+      exist in <codeph>T2</codeph> (in this case, the odd numbers 1, 3, and 5):
+    </p>
+
+<codeblock>[localhost:21000] &gt; select x from t1 left anti join t2 on (t1.x 
= t2.y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      See these tutorials for examples of different kinds of joins:
+    </p>
+
+    <ul>
+      <li>
+        <xref href="impala_tutorial.xml#tut_cross_join"/>
+      </li>
+    </ul>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_langref.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref.xml b/docs/topics/impala_langref.xml
new file mode 100644
index 0000000..aaa76aa
--- /dev/null
+++ b/docs/topics/impala_langref.xml
@@ -0,0 +1,179 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="langref">
+
+  <title><ph audience="PDF">Impala SQL Language Reference</ph><ph 
audience="HTML">Overview of Impala SQL</ph></title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="impala-shell"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Impala uses SQL as its query language. Impala interprets SQL statements 
and performs the
+      full end-to-end processing for each statement. (As opposed to acting as 
a translation
+      layer for some other Hadoop subsystem.)
+    </p>
+
+    <p>
+      Impala implements many familiar statements, such as <codeph>CREATE 
TABLE</codeph>,
+      <codeph>INSERT</codeph>, and <codeph>SELECT</codeph>. Currently, the DML 
statements
+      <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> are not available in 
the production
+      level of Impala, because big data analytics with Hadoop and HDFS 
typically involves
+      unchanging data. <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> 
<i>are</i> available
+      in beta form in the version of Impala used with the Kudu storage layer. 
For full details
+      about Impala SQL syntax and semantics, see
+      <xref href="impala_langref_sql.xml#langref_sql"/>.
+    </p>
+
+    <p>
+      Queries include clauses such as <codeph>WHERE</codeph>, <codeph>GROUP 
BY</codeph>,
+      <codeph>ORDER BY</codeph>, and <codeph>JOIN</codeph>. For information 
about query syntax,
+      see <xref href="impala_select.xml#select"/>.
+    </p>
+
+    <p>
+      Queries can also include function calls, to scalar functions such as
+      <codeph>sin()</codeph> and <codeph>substr()</codeph>, aggregate 
functions such as
+      <codeph>count()</codeph> and <codeph>avg()</codeph>, and analytic 
functions such as
+      <codeph>lag()</codeph> and <codeph>rank()</codeph>. For a list of the 
built-in functions
+      available in Impala queries, see <xref 
href="impala_functions.xml#builtins"/>.
+    </p>
+
+    <p outputclass="toc"/>
+
+  </conbody>
+
+  <concept id="langref_performance">
+
+    <title>Performance Features</title>
+
+    <conbody>
+
+      <p>
+        The main performance-related SQL features for Impala are:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            The <codeph>COMPUTE STATS</codeph> statement, and the underlying 
table statistics
+            and column statistics used in query planning. The statistics are 
used to estimate
+            the number of rows and size of the result set for queries, 
subqueries, and the
+            different <q>sides</q> of a join query.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The output of the <codeph>EXPLAIN</codeph> statement. It outlines 
the ways in which
+            the query is parallelized, and how much I/O, memory, and so on the 
query expects to
+            use. You can control the level of detail in the output through a 
query option.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Partitioning for tables. By organizing the data for efficient 
access along one or
+            more dimensions, this technique lets queries read only the 
relevant data.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Query hints, especially for join queries. Impala selects from 
different join
+            algorithms based on the relative sizes of the result sets for each 
side of the join.
+            In cases where you know the most effective technique for a 
particular query, you can
+            override the estimates that Impala uses to make that choice, and 
select the join
+            technique directly.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Query options. These options control settings that can influence 
the performance of
+            individual queries when you know the special considerations based 
on your workload,
+            hardware configuration, or data distribution.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        Because analytic queries against high volumes of data tend to require 
full scans against
+        large portions of data from each table, Impala does not include 
index-related SQL
+        statements such as <codeph>CREATE INDEX</codeph>. The <codeph>COMPUTE 
STATS</codeph>
+        serves the purpose of analyzing the distribution of data within each 
column and the
+        overall table. Partitioning optimizes the physical layout of the data 
for queries that
+        filter on one or more crucial columns.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="hive_interoperability">
+
+    <title>Sharing Tables, Data, and Queries Between Impala and Hive</title>
+
+    <conbody>
+
+      <p>
+        To protect user investment in skills development and query design, 
Impala provides a
+        high degree of compatibility with the Hive Query Language (HiveQL):
+      </p>
+
+      <ul>
+        <li>
+          Because Impala uses the same metadata store as Hive to record 
information about table
+          structure and properties, Impala can access tables defined through 
the native Impala
+          <codeph>CREATE TABLE</codeph> command, or tables created using the 
Hive data
+          definition language (DDL).
+        </li>
+
+        <li>
+          Impala supports data manipulation (DML) statements similar to the 
DML component of
+          HiveQL.
+        </li>
+
+        <li>
+          Impala provides many <xref 
href="impala_functions.xml#builtins">built-in
+          functions</xref> with the same names and parameter types as their 
HiveQL equivalents.
+        </li>
+      </ul>
+
+      <p>
+        Impala supports most of the same
+        <xref href="impala_langref_sql.xml#langref_sql">statements and 
clauses</xref> as HiveQL,
+        including, but not limited to <codeph>JOIN</codeph>, 
<codeph>AGGREGATE</codeph>,
+        <codeph>DISTINCT</codeph>, <codeph>UNION ALL</codeph>, <codeph>ORDER 
BY</codeph>,
+        <codeph>LIMIT</codeph> and (uncorrelated) subquery in the 
<codeph>FROM</codeph> clause.
+        Impala also supports <codeph>INSERT INTO</codeph> and <codeph>INSERT 
OVERWRITE</codeph>.
+      </p>
+
+      <p>
+        Impala supports data types with the same names and semantics as the 
equivalent Hive data
+        types: <codeph>STRING</codeph>, <codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>,
+        <codeph>INT</codeph>, <codeph>BIGINT</codeph>, <codeph>FLOAT</codeph>,
+        <codeph>DOUBLE</codeph>, <codeph>BOOLEAN</codeph>, 
<codeph>STRING</codeph>,
+        <codeph>TIMESTAMP</codeph>. CDH 5.5 / Impala 2.3 and higher also 
include the complex
+        types <codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and 
<codeph>MAP</codeph>.
+      </p>
+
+      <p>
+        Most HiveQL <codeph>SELECT</codeph> and <codeph>INSERT</codeph> 
statements run
+        unmodified with Impala. For information about Hive syntax not 
available in Impala, see
+        <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/>.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_langref_sql.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref_sql.xml 
b/docs/topics/impala_langref_sql.xml
new file mode 100644
index 0000000..d759e76
--- /dev/null
+++ b/docs/topics/impala_langref_sql.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="langref_sql">
+
+  <title>Impala SQL Statements</title>
+  <titlealts><navtitle>SQL Statements</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The Impala SQL dialect supports a range of standard elements, plus some 
extensions for Big Data use cases
+      related to data loading and data warehousing.
+    </p>
+
+    <note>
+      <p>
+        In the <cmdname>impala-shell</cmdname> interpreter, a semicolon at the 
end of each statement is required.
+        Since the semicolon is not actually part of the SQL syntax, we do not 
include it in the syntax definition
+        of each statement, but we do show it in examples intended to be run in 
<cmdname>impala-shell</cmdname>.
+      </p>
+    </note>
+
+    <p audience="PDF" outputclass="toc all">
+      The following sections show the major SQL statements that you work with 
in Impala:
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_langref_unsupported.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref_unsupported.xml 
b/docs/topics/impala_langref_unsupported.xml
new file mode 100644
index 0000000..f2b0560
--- /dev/null
+++ b/docs/topics/impala_langref_unsupported.xml
@@ -0,0 +1,296 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="langref_hiveql_delta">
+
+  <title>SQL Differences Between Impala and Hive</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Porting"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">Hive</indexterm>
+      <indexterm audience="Cloudera">HiveQL</indexterm>
+      Impala's SQL syntax follows the SQL-92 standard, and includes many 
industry extensions in areas such as
+      built-in functions. See <xref href="impala_porting.xml#porting"/> for a 
general discussion of adapting SQL
+      code from a variety of database systems to Impala.
+    </p>
+
+    <p>
+      Because Impala and Hive share the same metastore database and their 
tables are often used interchangeably,
+      the following section covers differences between Impala and Hive in 
detail.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="langref_hiveql_unsupported">
+
+    <title>HiveQL Features not Available in Impala</title>
+
+    <conbody>
+
+      <p>
+        The current release of Impala does not support the following SQL 
features that you might be familiar with
+        from HiveQL:
+      </p>
+
+      <draft-comment translate="no">
+Yeesh, too many separate lists of unsupported Hive syntax.
+Here, the FAQ, and in some of the intro topics.
+Some discussion in IMP-1061 about how best to reorg.
+Lots of opportunities for conrefs.
+</draft-comment>
+
+      <ul>
+<!-- Now supported in CDH 5.5 / Impala 2.3 and higher. Find places on this 
page (like already done under lateral views) to note the new data type support.
+        <li>
+          Non-scalar data types such as maps, arrays, structs.
+        </li>
+-->
+
+        <li rev="1.2">
+          Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom 
file formats, or custom SerDes.
+        </li>
+
+        <li>
+          XML and JSON functions.
+        </li>
+
+        <li>
+          Certain aggregate functions from HiveQL: <codeph>covar_pop</codeph>, 
<codeph>covar_samp</codeph>,
+          <codeph>corr</codeph>, <codeph>percentile</codeph>, 
<codeph>percentile_approx</codeph>,
+          <codeph>histogram_numeric</codeph>, <codeph>collect_set</codeph>; 
Impala supports the set of aggregate
+          functions listed in <xref 
href="impala_aggregate_functions.xml#aggregate_functions"/> and analytic
+          functions listed in <xref 
href="impala_analytic_functions.xml#analytic_functions"/>.
+        </li>
+
+        <li>
+          Sampling.
+        </li>
+
+        <li>
+          Lateral views. In CDH 5.5 / Impala 2.3 and higher, Impala supports 
queries on complex types
+          (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or 
<codeph>MAP</codeph>), using join notation
+          rather than the <codeph>EXPLODE()</codeph> keyword.
+          See <xref href="impala_complex_types.xml#complex_types"/> for 
details about Impala support for complex types.
+        </li>
+
+        <li>
+          Multiple <codeph>DISTINCT</codeph> clauses per query, although 
Impala includes some workarounds for this
+          limitation.
+          <note 
conref="../shared/impala_common.xml#common/multiple_count_distinct"/>
+        </li>
+      </ul>
+
+      <p>
+        User-defined functions (UDFs) are supported starting in Impala 1.2. 
See <xref href="impala_udf.xml#udfs"/>
+        for full details on Impala UDFs.
+        <ul>
+          <li>
+            Impala supports high-performance UDFs written in C++, as well as 
reusing some Java-based Hive UDFs.
+          </li>
+
+          <li>
+            Impala supports scalar UDFs and user-defined aggregate functions 
(UDAFs). Impala does not currently
+            support user-defined table generating functions (UDTFs).
+          </li>
+
+          <li>
+            Only Impala-supported column types are supported in Java-based 
UDFs.
+          </li>
+        </ul>
+      </p>
+
+      <p>
+        Impala does not currently support these HiveQL statements:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>ANALYZE TABLE</codeph> (the Impala equivalent is 
<codeph>COMPUTE STATS</codeph>)
+        </li>
+
+        <li>
+          <codeph>DESCRIBE COLUMN</codeph>
+        </li>
+
+        <li>
+          <codeph>DESCRIBE DATABASE</codeph>
+        </li>
+
+        <li>
+          <codeph>EXPORT TABLE</codeph>
+        </li>
+
+        <li>
+          <codeph>IMPORT TABLE</codeph>
+        </li>
+
+        <li>
+          <codeph>SHOW TABLE EXTENDED</codeph>
+        </li>
+
+        <li>
+          <codeph>SHOW INDEXES</codeph>
+        </li>
+
+        <li>
+          <codeph>SHOW COLUMNS</codeph>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="langref_hiveql_semantics">
+
+    <title>Semantic Differences Between Impala and HiveQL Features</title>
+
+    <conbody>
+
+      <p>
+        This section covers instances where Impala and Hive have similar 
functionality, sometimes including the
+        same syntax, but there are differences in the runtime semantics of 
those features.
+      </p>
+
+      <p>
+        <b>Security:</b>
+      </p>
+
+      <p>
+        Impala utilizes the <xref href="http://sentry.incubator.apache.org/"; 
scope="external" format="html">Apache
+        Sentry (incubating)</xref> authorization framework, which provides 
fine-grained role-based access control
+        to protect data against unauthorized access or tampering.
+      </p>
+
+      <p>
+        The Hive component included in CDH 5.1 and higher now includes 
Sentry-enabled <codeph>GRANT</codeph>,
+        <codeph>REVOKE</codeph>, and <codeph>CREATE/DROP ROLE</codeph> 
statements. Earlier Hive releases had a
+        privilege system with <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements that were primarily
+        intended to prevent accidental deletion of data, rather than a 
security mechanism to protect against
+        malicious users.
+      </p>
+
+      <p>
+        Impala can make use of privileges set up through Hive 
<codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements.
+        Impala has its own <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements in Impala 2.0 and higher.
+        See <xref href="impala_authorization.xml#authorization"/> for the 
details of authorization in Impala, including
+        how to switch from the original policy file-based privilege model to 
the Sentry service using privileges
+        stored in the metastore database.
+      </p>
+
+      <p>
+        <b>SQL statements and clauses:</b>
+      </p>
+
+      <p>
+        The semantics of Impala SQL statements varies from HiveQL in some 
cases where they use similar SQL
+        statement and clause names:
+      </p>
+
+      <ul>
+        <li>
+          Impala uses different syntax and names for query hints, 
<codeph>[SHUFFLE]</codeph> and
+          <codeph>[NOSHUFFLE]</codeph> rather than <codeph>MapJoin</codeph> or 
<codeph>StreamJoin</codeph>. See
+          <xref href="impala_joins.xml#joins"/> for the Impala details.
+        </li>
+
+        <li>
+          Impala does not expose MapReduce specific features of <codeph>SORT 
BY</codeph>, <codeph>DISTRIBUTE
+          BY</codeph>, or <codeph>CLUSTER BY</codeph>.
+        </li>
+
+        <li>
+          Impala does not require queries to include a <codeph>FROM</codeph> 
clause.
+        </li>
+      </ul>
+
+      <p>
+        <b>Data types:</b>
+      </p>
+
+      <ul>
+        <li>
+          Impala supports a limited set of implicit casts. This can help avoid 
undesired results from unexpected
+          casting behavior.
+          <ul>
+            <li>
+              Impala does not implicitly cast between string and numeric or 
Boolean types. Always use
+              <codeph>CAST()</codeph> for these conversions.
+            </li>
+
+            <li>
+              Impala does perform implicit casts among the numeric types, when 
going from a smaller or less precise
+              type to a larger or more precise one. For example, Impala will 
implicitly convert a
+              <codeph>SMALLINT</codeph> to a <codeph>BIGINT</codeph> or 
<codeph>FLOAT</codeph>, but to convert from
+              <codeph>DOUBLE</codeph> to <codeph>FLOAT</codeph> or 
<codeph>INT</codeph> to <codeph>TINYINT</codeph>
+              requires a call to <codeph>CAST()</codeph> in the query.
+            </li>
+
+            <li>
+              Impala does perform implicit casts from string to timestamp. 
Impala has a restricted set of literal
+              formats for the <codeph>TIMESTAMP</codeph> data type and the 
<codeph>from_unixtime()</codeph> format
+              string; see <xref href="impala_timestamp.xml#timestamp"/> for 
details.
+            </li>
+          </ul>
+          <p>
+            See <xref href="impala_datatypes.xml#datatypes"/> for full details 
on implicit and explicit casting for
+            all types, and <xref 
href="impala_conversion_functions.xml#conversion_functions"/> for details about
+            the <codeph>CAST()</codeph> function.
+          </p>
+        </li>
+
+        <li>
+          Impala does not store or interpret timestamps using the local 
timezone, to avoid undesired results from
+          unexpected time zone issues. Timestamps are stored and interpreted 
relative to UTC. This difference can
+          produce different results for some calls to similarly named 
date/time functions between Impala and Hive.
+          See <xref href="impala_datetime_functions.xml#datetime_functions"/> 
for details about the Impala
+          functions. See <xref href="impala_timestamp.xml#timestamp"/> for a 
discussion of how Impala handles
+          time zones, and configuration options you can use to make Impala 
match the Hive behavior more closely
+          when dealing with Parquet-encoded <codeph>TIMESTAMP</codeph> data or 
when converting between
+          the local time zone and UTC.
+        </li>
+
+        <li>
+          The Impala <codeph>TIMESTAMP</codeph> type can represent dates 
ranging from 1400-01-01 to 9999-12-31.
+          This is different from the Hive date range, which is 0000-01-01 to 
9999-12-31.
+        </li>
+
+        <li>
+          Impala does not return column overflows as <codeph>NULL</codeph>, so 
that customers can distinguish
+          between <codeph>NULL</codeph> data and overflow conditions similar 
to how they do so with traditional
+          database systems. Impala returns the largest or smallest value in 
the range for the type. For example,
+          valid values for a <codeph>tinyint</codeph> range from -128 to 127. 
In Impala, a <codeph>tinyint</codeph>
+          with a value of -200 returns -128 rather than <codeph>NULL</codeph>. 
A <codeph>tinyint</codeph> with a
+          value of 200 returns 127.
+        </li>
+      </ul>
+
+      <p>
+        <b>Miscellaneous features:</b>
+      </p>
+
+      <ul>
+        <li>
+          Impala does not provide virtual columns.
+        </li>
+
+        <li>
+          Impala does not expose locking.
+        </li>
+
+        <li>
+          Impala does not expose some configuration properties.
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_limit.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_limit.xml b/docs/topics/impala_limit.xml
new file mode 100644
index 0000000..c186cd4
--- /dev/null
+++ b/docs/topics/impala_limit.xml
@@ -0,0 +1,149 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="limit">
+
+  <title>LIMIT Clause</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Reports"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The <codeph>LIMIT</codeph> clause in a <codeph>SELECT</codeph> query 
sets a maximum number of rows for the
+      result set. Pre-selecting the maximum size of the result set helps 
Impala to optimize memory usage while
+      processing a distributed query.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>LIMIT <varname>constant_integer_expression</varname></codeblock>
+
+    <p>
+      The argument to the <codeph>LIMIT</codeph> clause must evaluate to a 
constant value. It can be a numeric
+      literal, or another kind of numeric expression involving operators, 
casts, and function return values. You
+      cannot refer to a column or use a subquery.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      This clause is useful in contexts such as:
+    </p>
+
+    <ul>
+      <li>
+        To return exactly N items from a top-N query, such as the 10 
highest-rated items in a shopping category or
+        the 50 hostnames that refer the most traffic to a web site.
+      </li>
+
+      <li>
+        To demonstrate some sample values from a table or a particular query. 
(To display some arbitrary items, use
+        a query with no <codeph>ORDER BY</codeph> clause. An <codeph>ORDER 
BY</codeph> clause causes additional
+        memory and/or disk usage during the query.)
+      </li>
+
+      <li>
+        To keep queries from returning huge result sets by accident if a table 
is larger than expected, or a
+        <codeph>WHERE</codeph> clause matches more rows than expected.
+      </li>
+    </ul>
+
+    <p rev="1.2.1">
+      Originally, the value for the <codeph>LIMIT</codeph> clause had to be a 
numeric literal. In Impala 1.2.1 and
+      higher, it can be a numeric expression.
+    </p>
+
+    <p rev="obwl" conref="../shared/impala_common.xml#common/order_by_limit"/>
+
+    <p>
+      See <xref href="impala_order_by.xml#order_by"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/limit_and_offset"/>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/subquery_no_limit"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how the <codeph>LIMIT</codeph> clause caps 
the size of the result set, with the
+      limit being applied after any other clauses such as 
<codeph>WHERE</codeph>.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create database limits;
+[localhost:21000] &gt; use limits;
+[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers values (1), (3), (4), (5), (2);
+Inserted 5 rows in 1.34s
+[localhost:21000] &gt; select x from numbers limit 100;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
+| 5 |
+| 2 |
++---+
+Returned 5 row(s) in 0.26s
+[localhost:21000] &gt; select x from numbers limit 3;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
++---+
+Returned 3 row(s) in 0.27s
+[localhost:21000] &gt; select x from numbers where x &gt; 2 limit 2;
++---+
+| x |
++---+
+| 3 |
+| 4 |
++---+
+Returned 2 row(s) in 0.27s</codeblock>
+
+    <p>
+      For top-N and bottom-N queries, you use the <codeph>ORDER BY</codeph> 
and <codeph>LIMIT</codeph> clauses
+      together:
+    </p>
+
+<codeblock rev="obwl">[localhost:21000] &gt; select x as "Top 3" from numbers 
order by x desc limit 3;
++-------+
+| top 3 |
++-------+
+| 5     |
+| 4     |
+| 3     |
++-------+
+[localhost:21000] &gt; select x as "Bottom 3" from numbers order by x limit 3;
++----------+
+| bottom 3 |
++----------+
+| 1        |
+| 2        |
+| 3        |
++----------+
+</codeblock>
+
+    <p>
+      You can use constant values besides integer literals as the 
<codeph>LIMIT</codeph> argument:
+    </p>
+
+<codeblock>-- Other expressions that yield constant integer values work too.
+SELECT x FROM t1 LIMIT 1e6;                        -- Limit is one million.
+SELECT x FROM t1 LIMIT length('hello world');      -- Limit is 11.
+SELECT x FROM t1 LIMIT 2+2;                        -- Limit is 4.
+SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT); -- Limit is 9.
+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_literals.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_literals.xml b/docs/topics/impala_literals.xml
new file mode 100644
index 0000000..3c53796
--- /dev/null
+++ b/docs/topics/impala_literals.xml
@@ -0,0 +1,384 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="literals">
+
+  <title>Literals</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">literals</indexterm>
+      Each of the Impala data types has corresponding notation for literal 
values of that type. You specify literal
+      values in SQL statements, such as in the <codeph>SELECT</codeph> list or 
<codeph>WHERE</codeph> clause of a
+      query, or as an argument to a function call. See <xref 
href="impala_datatypes.xml#datatypes"/> for a complete
+      list of types, ranges, and conversion rules.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="numeric_literals">
+
+    <title>Numeric Literals</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">numeric literals</indexterm>
+        To write literals for the integer types (<codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>,
+        <codeph>INT</codeph>, and <codeph>BIGINT</codeph>), use a sequence of 
digits with optional leading zeros.
+      </p>
+
+      <p rev="1.4.0">
+        To write literals for the floating-point types (<codeph 
rev="1.4.0">DECIMAL</codeph>,
+        <codeph>FLOAT</codeph>, and <codeph>DOUBLE</codeph>), use a sequence 
of digits with an optional decimal
+        point (<codeph>.</codeph> character). To preserve accuracy during 
arithmetic expressions, Impala interprets
+        floating-point literals as the <codeph>DECIMAL</codeph> type with the 
smallest appropriate precision and
+        scale, until required by the context to convert the result to 
<codeph>FLOAT</codeph> or
+        <codeph>DOUBLE</codeph>.
+      </p>
+
+      <p>
+        Integer values are promoted to floating-point when necessary, based on 
the context.
+      </p>
+
+      <p>
+        You can also use exponential notation by including an 
<codeph>e</codeph> character. For example,
+        <codeph>1e6</codeph> is 1 times 10 to the power of 6 (1 million). A 
number in exponential notation is
+        always interpreted as floating-point.
+      </p>
+
+      <p rev="tk">
+        When Impala encounters a numeric literal, it considers the type to be 
the <q>smallest</q> that can
+        accurately represent the value. The type is promoted to larger or more 
accurate types if necessary, based
+        on subsequent parts of an expression.
+      </p>
+      <p>
+        For example, you can see by the types Impala defines for the following 
table columns
+        how it interprets the corresponding numeric literals:
+      </p>
+<codeblock>[localhost:21000] > create table ten as select 10 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc ten;
++------+---------+---------+
+| name | type    | comment |
++------+---------+---------+
+| x    | tinyint |         |
++------+---------+---------+
+
+[localhost:21000] > create table four_k as select 4096 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc four_k;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| x    | smallint |         |
++------+----------+---------+
+
+[localhost:21000] > create table one_point_five as select 1.5 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc one_point_five;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(2,1) |         |
++------+--------------+---------+
+
+[localhost:21000] > create table one_point_three_three_three as select 1.333 
as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc one_point_three_three_three;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(4,3) |         |
++------+--------------+---------+
+</codeblock>
+    </conbody>
+  </concept>
+
+  <concept id="string_literals">
+
+    <title>String Literals</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">string literals</indexterm>
+        String literals are quoted using either single or double quotation 
marks. You can use either kind of quotes
+        for string literals, even both kinds for different literals within the 
same statement.
+      </p>
+
+      <p rev="2.0.0">
+        Quoted literals are considered to be of type <codeph>STRING</codeph>. 
To use quoted literals in contexts
+        requiring a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> value, 
<codeph>CAST()</codeph> the literal to
+        a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> of the appropriate 
length.
+      </p>
+
+      <p>
+        <b>Escaping special characters:</b>
+      </p>
+
+      <p>
+        To encode special characters within a string literal, precede them 
with the backslash (<codeph>\</codeph>)
+        escape character:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>\t</codeph> represents a tab.
+        </li>
+
+        <li>
+          <codeph>\n</codeph> represents a newline or linefeed. This might 
cause extra line breaks in
+          <cmdname>impala-shell</cmdname> output.
+        </li>
+
+        <li>
+          <codeph>\r</codeph> represents a carriage return. This might cause 
unusual formatting (making it appear
+          that some content is overwritten) in <cmdname>impala-shell</cmdname> 
output.
+        </li>
+
+        <li>
+          <codeph>\b</codeph> represents a backspace. This might cause unusual 
formatting (making it appear that
+          some content is overwritten) in <cmdname>impala-shell</cmdname> 
output.
+        </li>
+
+        <li>
+          <codeph>\0</codeph> represents an ASCII <codeph>nul</codeph> 
character (not the same as a SQL
+          <codeph>NULL</codeph>). This might not be visible in 
<cmdname>impala-shell</cmdname> output.
+        </li>
+
+        <li>
+          <codeph>\Z</codeph> represents a DOS end-of-file character. This 
might not be visible in
+          <cmdname>impala-shell</cmdname> output.
+        </li>
+
+        <li>
+          <codeph>\%</codeph> and <codeph>\_</codeph> can be used to escape 
wildcard characters within the string
+          passed to the <codeph>LIKE</codeph> operator.
+        </li>
+
+        <li>
+          <codeph>\</codeph> followed by 3 octal digits represents the ASCII 
code of a single character; for
+          example, <codeph>\101</codeph> is ASCII 65, the character 
<codeph>A</codeph>.
+        </li>
+
+        <li>
+          Use two consecutive backslashes (<codeph>\\</codeph>) to prevent the 
backslash from being interpreted as
+          an escape character.
+        </li>
+
+        <li>
+          Use the backslash to escape single or double quotation mark 
characters within a string literal, if the
+          literal is enclosed by the same type of quotation mark.
+        </li>
+
+        <li>
+          If the character following the <codeph>\</codeph> does not represent 
the start of a recognized escape
+          sequence, the character is passed through unchanged.
+        </li>
+      </ul>
+
+      <p>
+        <b>Quotes within quotes:</b>
+      </p>
+
+      <p>
+        To include a single quotation character within a string value, enclose 
the literal with either single or
+        double quotation marks, and optionally escape the single quote as a 
<codeph>\'</codeph> sequence. Earlier
+        releases required escaping a single quote inside double quotes. 
Continue using escape sequences in this
+        case if you also need to run your SQL code on older versions of Impala.
+      </p>
+
+      <p>
+        To include a double quotation character within a string value, enclose 
the literal with single quotation
+        marks, no escaping is necessary in this case. Or, enclose the literal 
with double quotation marks and
+        escape the double quote as a <codeph>\"</codeph> sequence.
+      </p>
+
+<codeblock>[localhost:21000] &gt; select "What\'s happening?" as 
single_within_double,
+                  &gt;        'I\'m not sure.' as single_within_single,
+                  &gt;        "Homer wrote \"The Iliad\"." as 
double_within_double,
+                  &gt;        'Homer also wrote "The Odyssey".' as 
double_within_single;
++----------------------+----------------------+--------------------------+---------------------------------+
+| single_within_double | single_within_single | double_within_double     | 
double_within_single            |
++----------------------+----------------------+--------------------------+---------------------------------+
+| What's happening?    | I'm not sure.        | Homer wrote "The Iliad". | 
Homer also wrote "The Odyssey". |
++----------------------+----------------------+--------------------------+---------------------------------+
+</codeblock>
+
+      <p>
+        <b>Field terminator character in CREATE TABLE:</b>
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/thorn"/>
+
+      <p>
+        <b>impala-shell considerations:</b>
+      </p>
+
+      <p>
+        When dealing with output that includes non-ASCII or non-printable 
characters such as linefeeds and
+        backspaces, use the <cmdname>impala-shell</cmdname> options to save to 
a file, turn off pretty printing, or
+        both rather than relying on how the output appears visually. See
+        <xref href="impala_shell_options.xml#shell_options"/> for a list of 
<cmdname>impala-shell</cmdname>
+        options.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="boolean_literals">
+
+    <title>Boolean Literals</title>
+
+    <conbody>
+
+      <p>
+        For <codeph>BOOLEAN</codeph> values, the literals are 
<codeph>TRUE</codeph> and <codeph>FALSE</codeph>,
+        with no quotation marks and case-insensitive.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>select true;
+select * from t1 where assertion = false;
+select case bool_col when true then 'yes' when false 'no' else 'null' end from 
t1;</codeblock>
+    </conbody>
+  </concept>
+
+  <concept id="timestamp_literals">
+
+    <title>Timestamp Literals</title>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/timestamp_conversions"/>
+
+      <p>
+        You can also use <codeph>INTERVAL</codeph> expressions to add or 
subtract from timestamp literal values,
+        such as <codeph>'1966-07-30' + INTERVAL 5 YEARS + INTERVAL 3 
DAYS</codeph>. See
+        <xref href="impala_timestamp.xml#timestamp"/> for details.
+      </p>
+
+      <p>
+        Depending on your data pipeline, you might receive date and time data 
as text, in notation that does not
+        exactly match the format for Impala <codeph>TIMESTAMP</codeph> 
literals.
+        See <xref href="impala_datetime_functions.xml#datetime_functions"/> 
for functions that can convert
+        between a variety of string literals (including different field order, 
separators, and timezone notation)
+        and equivalent <codeph>TIMESTAMP</codeph> or numeric values.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="null">
+
+    <title>NULL</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">NULL</indexterm>
+        The notion of <codeph>NULL</codeph> values is familiar from all kinds 
of database systems, but each SQL
+        dialect can have its own behavior and restrictions on 
<codeph>NULL</codeph> values. For Big Data
+        processing, the precise semantics of <codeph>NULL</codeph> values are 
significant: any misunderstanding
+        could lead to inaccurate results or misformatted data, that could be 
time-consuming to correct for large
+        data sets.
+      </p>
+
+      <ul>
+        <li>
+          <codeph>NULL</codeph> is a different value than an empty string. The 
empty string is represented by a
+          string literal with nothing inside, <codeph>""</codeph> or 
<codeph>''</codeph>.
+        </li>
+
+        <li>
+          In a delimited text file, the <codeph>NULL</codeph> value is 
represented by the special token
+          <codeph>\N</codeph>.
+        </li>
+
+        <li>
+          When Impala inserts data into a partitioned table, and the value of 
one of the partitioning columns is
+          <codeph>NULL</codeph> or the empty string, the data is placed in a 
special partition that holds only
+          these two kinds of values. When these values are returned in a 
query, the result is <codeph>NULL</codeph>
+          whether the value was originally <codeph>NULL</codeph> or an empty 
string. This behavior is compatible
+          with the way Hive treats <codeph>NULL</codeph> values in partitioned 
tables. Hive does not allow empty
+          strings as partition keys, and it returns a string value such as
+          <codeph>__HIVE_DEFAULT_PARTITION__</codeph> instead of 
<codeph>NULL</codeph> when such values are
+          returned from a query. For example:
+<codeblock>create table t1 (i int) partitioned by (x int, y string);
+-- Select an INT column from another table, with all rows going into a special 
HDFS subdirectory
+-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the 
partitioning keys
+-- are null, this special directory name occurs at different levels of the 
physical data directory
+-- for the table.
+insert into t1 partition(x=NULL, y=NULL) select c1 from some_other_table;
+insert into t1 partition(x, y=NULL) select c1, c2 from some_other_table;
+insert into t1 partition(x=NULL, y) select c1, c3  from 
some_other_table;</codeblock>
+        </li>
+
+        <li>
+          There is no <codeph>NOT NULL</codeph> clause when defining a column 
to prevent <codeph>NULL</codeph>
+          values in that column.
+        </li>
+
+        <li>
+          There is no <codeph>DEFAULT</codeph> clause to specify a 
non-<codeph>NULL</codeph> default value.
+        </li>
+
+        <li>
+          If an <codeph>INSERT</codeph> operation mentions some columns but 
not others, the unmentioned columns
+          contain <codeph>NULL</codeph> for all inserted rows.
+        </li>
+
+        <li rev="1.2.1">
+          <p conref="../shared/impala_common.xml#common/null_sorting_change"/>
+          <note>
+            <draft-comment translate="no"> Probably a bunch of similar 
view-related restrictions like this that should be collected, reused, or 
cross-referenced under the Views topic. </draft-comment>
+            Because the <codeph>NULLS FIRST</codeph> and <codeph>NULLS 
LAST</codeph> keywords are not currently
+            available in Hive queries, any views you create using those 
keywords will not be available through
+            Hive.
+          </note>
+        </li>
+
+        <li>
+          In all other contexts besides sorting with <codeph>ORDER 
BY</codeph>, comparing a <codeph>NULL</codeph>
+          to anything else returns <codeph>NULL</codeph>, making the 
comparison meaningless. For example,
+          <codeph>10 &gt; NULL</codeph> produces <codeph>NULL</codeph>, 
<codeph>10 &lt; NULL</codeph> also produces
+          <codeph>NULL</codeph>, <codeph>5 BETWEEN 1 AND NULL</codeph> 
produces <codeph>NULL</codeph>, and so on.
+        </li>
+      </ul>
+
+      <p>
+        Several built-in functions serve as shorthand for evaluating 
expressions and returning
+        <codeph>NULL</codeph>, 0, or some other substitution value depending 
on the expression result:
+        <codeph>ifnull()</codeph>, <codeph>isnull()</codeph>, 
<codeph>nvl()</codeph>, <codeph>nullif()</codeph>,
+        <codeph>nullifzero()</codeph>, and <codeph>zeroifnull()</codeph>. See
+        <xref href="impala_conditional_functions.xml#conditional_functions"/> 
for details.
+      </p>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_live_progress.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_live_progress.xml 
b/docs/topics/impala_live_progress.xml
new file mode 100644
index 0000000..f58cdcb
--- /dev/null
+++ b/docs/topics/impala_live_progress.xml
@@ -0,0 +1,81 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.3.0" id="live_progress">
+
+  <title>LIVE_PROGRESS Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="impala-shell"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">LIVE_PROGRESS query option</indexterm>
+      For queries submitted through the <cmdname>impala-shell</cmdname> 
command,
+      displays an interactive progress bar showing roughly what percentage of
+      processing has been completed. When the query finishes, the progress bar 
is erased
+      from the <cmdname>impala-shell</cmdname> console output.
+    </p>
+
+    <p>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <p conref="../shared/impala_common.xml#common/command_line_blurb"/>
+    <p>
+      You can enable this query option within <cmdname>impala-shell</cmdname>
+      by starting the shell with the <codeph>--live_progress</codeph>
+      command-line option.
+      You can still turn this setting off and on again within the shell 
through the
+      <codeph>SET</codeph> command.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p conref="../shared/impala_common.xml#common/live_reporting_details"/>
+    <p>
+      For a more detailed way of tracking the progress of an interactive query 
through
+      all phases of processing, see <xref 
href="impala_live_summary.xml#live_summary"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+    <p>
+      Because the percentage complete figure is calculated using the number of
+      issued and completed <q>scan ranges</q>, which occur while reading the 
table
+      data, the progress bar might reach 100% before the query is entirely 
finished.
+      For example, the query might do work to perform aggregations after all 
the
+      table data has been read. If many of your queries fall into this 
category,
+      consider using the <codeph>LIVE_SUMMARY</codeph> option instead for
+      more granular progress reporting.
+    </p>
+    <p 
conref="../shared/impala_common.xml#common/impala_shell_progress_reports_compute_stats_caveat"/>
+    <p 
conref="../shared/impala_common.xml#common/impala_shell_progress_reports_shell_only_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+<codeblock><![CDATA[[localhost:21000] > set live_progress=true;
+LIVE_PROGRESS set to true
+[localhost:21000] > select count(*) from customer;
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
+[##################################################                            
                      ] 50%
+[####################################################################################################]
 100%
+
+]]>
+</codeblock>
+
+    <p 
conref="../shared/impala_common.xml#common/live_progress_live_summary_asciinema"/>
+
+  </conbody>
+</concept>

[10/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

Reply via email to