[13/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

jrussell Tue, 26 Jul 2016 16:06:19 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_decimal.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_decimal.xml b/docs/topics/impala_decimal.xml
new file mode 100644
index 0000000..c0c98d9
--- /dev/null
+++ b/docs/topics/impala_decimal.xml
@@ -0,0 +1,836 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.4.0" id="decimal">
+
+  <title>DECIMAL Data Type (CDH 5.1 or higher only)</title>
+  <titlealts><navtitle>DECIMAL (CDH 5.1 or higher only)</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      A numeric data type with fixed scale and precision, used in 
<codeph>CREATE TABLE</codeph> and <codeph>ALTER
+      TABLE</codeph> statements. Suitable for financial and other arithmetic 
calculations where the imprecise
+      representation and rounding behavior of <codeph>FLOAT</codeph> and 
<codeph>DOUBLE</codeph> make those types
+      impractical.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+    </p>
+
+<codeblock><varname>column_name</varname> 
DECIMAL[(<varname>precision</varname>[,<varname>scale</varname>])]</codeblock>
+
+    <p>
+      <codeph>DECIMAL</codeph> with no precision or scale values is equivalent 
to <codeph>DECIMAL(9,0)</codeph>.
+    </p>
+
+    <p>
+      <b>Precision and Scale:</b>
+    </p>
+
+    <p>
+      <varname>precision</varname> represents the total number of digits that 
can be represented by the column,
+      regardless of the location of the decimal point. This value must be 
between 1 and 38. For example,
+      representing integer values up to 9999, and floating-point values up to 
99.99, both require a precision of 4.
+      You can also represent corresponding negative values, without any change 
in the precision. For example, the
+      range -9999 to 9999 still only requires a precision of 4.
+    </p>
+
+    <p>
+      <varname>scale</varname> represents the number of fractional digits. 
This value must be less than or equal to
+      <varname>precision</varname>. A scale of 0 produces integral values, 
with no fractional part. If precision
+      and scale are equal, all the digits come after the decimal point, making 
all the values between 0 and
+      0.999... or 0 and -0.999...
+    </p>
+
+    <p>
+      When <varname>precision</varname> and <varname>scale</varname> are 
omitted, a <codeph>DECIMAL</codeph> value
+      is treated as <codeph>DECIMAL(9,0)</codeph>, that is, an integer value 
ranging from
+      <codeph>-999,999,999</codeph> to <codeph>999,999,999</codeph>. This is 
the largest <codeph>DECIMAL</codeph>
+      value that can still be represented in 4 bytes. If precision is 
specified but scale is omitted, Impala uses a
+      value of zero for the scale.
+    </p>
+
+    <p>
+      Both <varname>precision</varname> and <varname>scale</varname> must be 
specified as integer literals, not any
+      other kind of constant expressions.
+    </p>
+
+    <p>
+      To check the precision or scale for arbitrary values, you can call the
+      <xref 
href="impala_math_functions.xml#math_functions"><codeph>precision()</codeph> and
+      <codeph>scale()</codeph> built-in functions</xref>. For example, you 
might use these values to figure out how
+      many characters are required for various fields in a report, or to 
understand the rounding characteristics of
+      a formula as applied to a particular <codeph>DECIMAL</codeph> column.
+    </p>
+
+    <p>
+      <b>Range:</b>
+    </p>
+
+    <p>
+      The maximum precision value is 38. Thus, the largest integral value is 
represented by
+      <codeph>DECIMAL(38,0)</codeph> (999... with 9 repeated 38 times). The 
most precise fractional value (between
+      0 and 1, or 0 and -1) is represented by <codeph>DECIMAL(38,38)</codeph>, 
with 38 digits to the right of the
+      decimal point. The value closest to 0 would be .0000...1 (37 zeros and 
the final 1). The value closest to 1
+      would be .999... (9 repeated 38 times).
+    </p>
+
+    <p>
+      For a given precision and scale, the range of <codeph>DECIMAL</codeph> 
values is the same in the positive and
+      negative directions. For example, <codeph>DECIMAL(4,2)</codeph> can 
represent from -99.99 to 99.99. This is
+      different from other integral numeric types where the positive and 
negative bounds differ slightly.
+    </p>
+
+    <p>
+      When you use <codeph>DECIMAL</codeph> values in arithmetic expressions, 
the precision and scale of the result
+      value are determined as follows:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          For addition and subtraction, the precision and scale are based on 
the maximum possible result, that is,
+          if all the digits of the input values were 9s and the absolute 
values were added together.
+        </p>
+<!-- Seems like buggy output from this first query, so hiding the example for 
the time being. -->
+<codeblock audience="Cloudera">[localhost:21000] &gt; select 50000.5 + 12.444, 
precision(50000.5 + 12.444), scale(50000.5 + 12.444);
++------------------+-----------------------------+-------------------------+
+| 50000.5 + 12.444 | precision(50000.5 + 12.444) | scale(50000.5 + 12.444) |
++------------------+-----------------------------+-------------------------+
+| 50012.944        | 9                           | 3                       |
++------------------+-----------------------------+-------------------------+
+[localhost:21000] &gt; select 99999.9 + 99.999, precision(99999.9 + 99.999), 
scale(99999.9 + 99.999);
++------------------+-----------------------------+-------------------------+
+| 99999.9 + 99.999 | precision(99999.9 + 99.999) | scale(99999.9 + 99.999) |
++------------------+-----------------------------+-------------------------+
+| 100099.899       | 9                           | 3                       |
++------------------+-----------------------------+-------------------------+
+</codeblock>
+      </li>
+
+      <li>
+        <p>
+          For multiplication, the precision is the sum of the precisions of 
the input values. The scale is the sum
+          of the scales of the input values.
+        </p>
+      </li>
+
+<!-- Need to add some specifics to discussion of division. Details here: 
http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/29/564110.aspx -->
+
+      <li>
+        <p>
+          For division, Impala sets the precision and scale to values large 
enough to represent the whole and
+          fractional parts of the result.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          For <codeph>UNION</codeph>, the scale is the larger of the scales of 
the input values, and the precision
+          is increased if necessary to accommodate any additional fractional 
digits. If the same input value has
+          the largest precision and the largest scale, the result value has 
the same precision and scale. If one
+          value has a larger precision but smaller scale, the scale of the 
result value is increased. For example,
+          <codeph>DECIMAL(20,2) UNION DECIMAL(8,6)</codeph> produces a result 
of type
+          <codeph>DECIMAL(24,6)</codeph>. The extra 4 fractional digits of 
scale (6-2) are accommodated by
+          extending the precision by the same amount (20+4).
+        </p>
+      </li>
+
+      <li>
+        <p>
+          To doublecheck, you can always call the <codeph>PRECISION()</codeph> 
and <codeph>SCALE()</codeph>
+          functions on the results of an arithmetic expression to see the 
relevant values, or use a <codeph>CREATE
+          TABLE AS SELECT</codeph> statement to define a column based on the 
return type of the expression.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <ul>
+      <li>
+        Using the <codeph>DECIMAL</codeph> type is only supported under CDH 
5.1.0 and higher.
+<!--
+    Although Impala-created tables containing <codeph>DECIMAL</codeph> columns 
are
+    readable in CDH 5.1, <codeph>DECIMAL</codeph> data is not interoperable 
with
+    other Hadoop components in CDH 4, and some Impala operations such as
+    <codeph>COMPUTE STATS</codeph> are not possible on such tables in CDH 4.
+        If you create a Parquet table with a <codeph>DECIMAL</codeph>
+        column under CDH 4, Impala issues a warning because the data files 
might not be readable from other CDH 4 components.
+-->
+      </li>
+
+<!--
+      <li>
+        The <codeph>DECIMAL</codeph> data type is a relatively new addition to 
the
+        Parquet file format. To read Impala-created Parquet files containing
+        <codeph>DECIMAL</codeph> columns from another Hadoop component such as
+        MapReduce, Pig, or Hive, use CDH 5.1 or higher, or the equivalent 
levels of the relevant components and Parquet
+        JARs from CDH 5.1.
+        If you create a Parquet table with a <codeph>DECIMAL</codeph>
+        column under CDH 4, Impala issues a warning because the data files 
might not be readable from other CDH 4 components.
+      </li>
+
+      <li>
+        In particular, Impala-created tables with <codeph>DECIMAL</codeph> 
columns are
+        not readable by Hive under CDH 4.
+      </li>
+-->
+
+      <li>
+        Use the <codeph>DECIMAL</codeph> data type in Impala for applications 
where you used the
+        <codeph>NUMBER</codeph> data type in Oracle. The Impala 
<codeph>DECIMAL</codeph> type does not support the
+        Oracle idioms of <codeph>*</codeph> for scale or negative values for 
precision.
+      </li>
+    </ul>
+
+    <p>
+      <b>Conversions and casting:</b>
+    </p>
+
+    <p>
+      <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+    </p>
+
+    <p>
+      Impala automatically converts between <codeph>DECIMAL</codeph> and other 
numeric types where possible. A
+      <codeph>DECIMAL</codeph> with zero scale is converted to or from the 
smallest appropriate integral type. A
+      <codeph>DECIMAL</codeph> with a fractional part is automatically 
converted to or from the smallest
+      appropriate floating-point type. If the destination type does not have 
sufficient precision or scale to hold
+      all possible values of the source type, Impala raises an error and does 
not convert the value.
+    </p>
+
+    <p>
+      For example, these statements show how expressions of 
<codeph>DECIMAL</codeph> and other types are reconciled
+      to the same type in the context of <codeph>UNION</codeph> queries and 
<codeph>INSERT</codeph> statements:
+    </p>
+
+<codeblock>[localhost:21000] &gt; select cast(1 as int) as x union select 
cast(1.5 as decimal(9,4)) as x;
++----------------+
+| x              |
++----------------+
+| 1.5000         |
+| 1.0000         |
++----------------+
+[localhost:21000] &gt; create table int_vs_decimal as select cast(1 as int) as 
x union select cast(1.5 as decimal(9,4)) as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+[localhost:21000] &gt; desc int_vs_decimal;
++------+---------------+---------+
+| name | type          | comment |
++------+---------------+---------+
+| x    | decimal(14,4) |         |
++------+---------------+---------+
+</codeblock>
+
+    <p>
+      To avoid potential conversion errors, you can use 
<codeph>CAST()</codeph> to convert <codeph>DECIMAL</codeph>
+      values to <codeph>FLOAT</codeph>, <codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>, <codeph>INT</codeph>,
+      <codeph>BIGINT</codeph>, <codeph>STRING</codeph>, 
<codeph>TIMESTAMP</codeph>, or <codeph>BOOLEAN</codeph>.
+      You can use exponential notation in <codeph>DECIMAL</codeph> literals or 
when casting from
+      <codeph>STRING</codeph>, for example <codeph>1.0e6</codeph> to represent 
one million.
+    </p>
+
+    <p>
+      If you cast a value with more fractional digits than the scale of the 
destination type, any extra fractional
+      digits are truncated (not rounded). Casting a value to a target type 
with not enough precision produces a
+      result of <codeph>NULL</codeph> and displays a runtime warning.
+    </p>
+
+<codeblock>[localhost:21000] &gt; select cast(1.239 as decimal(3,2));
++-----------------------------+
+| cast(1.239 as decimal(3,2)) |
++-----------------------------+
+| 1.23                        |
++-----------------------------+
+[localhost:21000] &gt; select cast(1234 as decimal(3));
++----------------------------+
+| cast(1234 as decimal(3,0)) |
++----------------------------+
+| NULL                       |
++----------------------------+
+WARNINGS: Expression overflowed, returning NULL
+</codeblock>
+
+    <p>
+      When you specify integer literals, for example in <codeph>INSERT ... 
VALUES</codeph> statements or arithmetic
+      expressions, those numbers are interpreted as the smallest applicable 
integer type. You must use
+      <codeph>CAST()</codeph> calls for some combinations of integer literals 
and <codeph>DECIMAL</codeph>
+      precision. For example, <codeph>INT</codeph> has a maximum value that is 
10 digits long,
+      <codeph>TINYINT</codeph> has a maximum value that is 3 digits long, and 
so on. If you specify a value such as
+      123456 to go into a <codeph>DECIMAL</codeph> column, Impala checks if 
the column has enough precision to
+      represent the largest value of that integer type, and raises an error if 
not. Therefore, use an expression
+      like <codeph>CAST(123456 TO DECIMAL(9,0))</codeph> for 
<codeph>DECIMAL</codeph> columns with precision 9 or
+      less, <codeph>CAST(50 TO DECIMAL(2,0))</codeph> for 
<codeph>DECIMAL</codeph> columns with precision 2 or
+      less, and so on. For <codeph>DECIMAL</codeph> columns with precision 10 
or greater, Impala automatically
+      interprets the value as the correct <codeph>DECIMAL</codeph> type; 
however, because
+      <codeph>DECIMAL(10)</codeph> requires 8 bytes of storage while 
<codeph>DECIMAL(9)</codeph> requires only 4
+      bytes, only use precision of 10 or higher when actually needed.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table decimals_9_0 (x decimal);
+[localhost:21000] &gt; insert into decimals_9_0 values (1), (2), (4), (8), 
(16), (1024), (32768), (65536), (1000000);
+ERROR: AnalysisException: Possible loss of precision for target table 
'decimal_testing.decimals_9_0'.
+Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
+[localhost:21000] &gt; insert into decimals_9_0 values (cast(1 as decimal)), 
(cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as 
decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as 
decimal)), (cast(1000000 as decimal));
+
+[localhost:21000] &gt; create table decimals_10_0 (x decimal(10,0));
+[localhost:21000] &gt; insert into decimals_10_0 values (1), (2), (4), (8), 
(16), (1024), (32768), (65536), (1000000);
+[localhost:21000] &gt;
+</codeblock>
+
+    <p>
+      Be aware that in memory and for binary file formats such as Parquet or 
Avro, <codeph>DECIMAL(10)</codeph> or
+      higher consumes 8 bytes while <codeph>DECIMAL(9)</codeph> (the default 
for <codeph>DECIMAL</codeph>) or lower
+      consumes 4 bytes. Therefore, to conserve space in large tables, use the 
smallest-precision
+      <codeph>DECIMAL</codeph> type that is appropriate and 
<codeph>CAST()</codeph> literal values where necessary,
+      rather than declaring <codeph>DECIMAL</codeph> columns with high 
precision for convenience.
+    </p>
+
+    <p>
+      To represent a very large or precise <codeph>DECIMAL</codeph> value as a 
literal, for example one that
+      contains more digits than can be represented by a 
<codeph>BIGINT</codeph> literal, use a quoted string or a
+      floating-point value for the number, and <codeph>CAST()</codeph> to the 
desired <codeph>DECIMAL</codeph>
+      type:
+    </p>
+
+<codeblock>insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), 
(32768), (65536), (1000000),
+  (cast("999999999999999999999999999999" as decimal(38,5))),
+  (cast(999999999999999999999999999999. as decimal(38,5)));
+</codeblock>
+
+    <ul>
+      <li>
+        <p>
+          The result of an aggregate function such as <codeph>MAX()</codeph>, 
<codeph>SUM()</codeph>, or
+          <codeph>AVG()</codeph> on <codeph>DECIMAL</codeph> values is 
promoted to a scale of 38, with the same
+          precision as the underlying column. Thus, the result can represent 
the largest possible value at that
+          particular precision.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          <codeph>STRING</codeph> columns, literals, or expressions can be 
converted to <codeph>DECIMAL</codeph> as
+          long as the overall number of digits and digits to the right of the 
decimal point fit within the
+          specified precision and scale for the declared 
<codeph>DECIMAL</codeph> type. By default, a
+          <codeph>DECIMAL</codeph> value with no specified scale or precision 
can hold a maximum of 9 digits of an
+          integer value. If there are more digits in the string value than are 
allowed by the
+          <codeph>DECIMAL</codeph> scale and precision, the result is 
<codeph>NULL</codeph>.
+        </p>
+        <p>
+          The following examples demonstrate how <codeph>STRING</codeph> 
values with integer and fractional parts
+          are represented when converted to <codeph>DECIMAL</codeph>. If the 
scale is 0, the number is treated
+          as an integer value with a maximum of <varname>precision</varname> 
digits. If the precision is greater than
+          0, the scale must be increased to account for the digits both to the 
left and right of the decimal point.
+          As the precision increases, output values are printed with 
additional trailing zeros after the decimal
+          point if needed. Any trailing zeros after the decimal point in the 
<codeph>STRING</codeph> value must fit
+          within the number of digits specified by the precision.
+        </p>
+<codeblock>[localhost:21000] &gt; select cast('100' as decimal); -- Small 
integer value fits within 9 digits of scale.
++-----------------------------+
+| cast('100' as decimal(9,0)) |
++-----------------------------+
+| 100                         |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(3,0)); -- Small integer 
value fits within 3 digits of scale.
++-----------------------------+
+| cast('100' as decimal(3,0)) |
++-----------------------------+
+| 100                         |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(2,0)); -- 2 digits of 
scale is not enough!
++-----------------------------+
+| cast('100' as decimal(2,0)) |
++-----------------------------+
+| NULL                        |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(3,1)); -- (3,1) = 2 digits 
left of the decimal point, 1 to the right. Not enough.
++-----------------------------+
+| cast('100' as decimal(3,1)) |
++-----------------------------+
+| NULL                        |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(4,1)); -- 4 digits total, 
1 to the right of the decimal point.
++-----------------------------+
+| cast('100' as decimal(4,1)) |
++-----------------------------+
+| 100.0                       |
++-----------------------------+
+[localhost:21000] &gt; select cast('98.6' as decimal(3,1)); -- (3,1) can hold 
a 3 digit number with 1 fractional digit.
++------------------------------+
+| cast('98.6' as decimal(3,1)) |
++------------------------------+
+| 98.6                         |
++------------------------------+
+[localhost:21000] &gt; select cast('98.6' as decimal(15,1)); -- Larger scale 
allows bigger numbers but still only 1 fractional digit.
++-------------------------------+
+| cast('98.6' as decimal(15,1)) |
++-------------------------------+
+| 98.6                          |
++-------------------------------+
+[localhost:21000] &gt; select cast('98.6' as decimal(15,5)); -- Larger 
precision allows more fractional digits, outputs trailing zeros.
++-------------------------------+
+| cast('98.6' as decimal(15,5)) |
++-------------------------------+
+| 98.60000                      |
++-------------------------------+
+[localhost:21000] &gt; select cast('98.60000' as decimal(15,1)); -- Trailing 
zeros in the string must fit within 'scale' digits (1 in this case).
++-----------------------------------+
+| cast('98.60000' as decimal(15,1)) |
++-----------------------------------+
+| NULL                              |
++-----------------------------------+
+</codeblock>
+      </li>
+
+      <li>
+        Most built-in arithmetic functions such as <codeph>SIN()</codeph> and 
<codeph>COS()</codeph> continue to
+        accept only <codeph>DOUBLE</codeph> values because they are so 
commonly used in scientific context for
+        calculations of IEEE 954-compliant values. The built-in functions that 
accept and return
+        <codeph>DECIMAL</codeph> are:
+<!-- List from Skye: positive, negative, least, greatest, fnv_hash, if, 
nullif, zeroifnull, isnull, coalesce -->
+<!-- Nong had already told me about abs, ceil, floor, round, truncate -->
+        <ul>
+          <li>
+            <codeph>ABS()</codeph>
+          </li>
+
+          <li>
+            <codeph>CEIL()</codeph>
+          </li>
+
+          <li>
+            <codeph>COALESCE()</codeph>
+          </li>
+
+          <li>
+            <codeph>FLOOR()</codeph>
+          </li>
+
+          <li>
+            <codeph>FNV_HASH()</codeph>
+          </li>
+
+          <li>
+            <codeph>GREATEST()</codeph>
+          </li>
+
+          <li>
+            <codeph>IF()</codeph>
+          </li>
+
+          <li>
+            <codeph>ISNULL()</codeph>
+          </li>
+
+          <li>
+            <codeph>LEAST()</codeph>
+          </li>
+
+          <li>
+            <codeph>NEGATIVE()</codeph>
+          </li>
+
+          <li>
+            <codeph>NULLIF()</codeph>
+          </li>
+
+          <li>
+            <codeph>POSITIVE()</codeph>
+          </li>
+
+          <li>
+            <codeph>PRECISION()</codeph>
+          </li>
+
+          <li>
+            <codeph>ROUND()</codeph>
+          </li>
+
+          <li>
+            <codeph>SCALE()</codeph>
+          </li>
+
+          <li>
+            <codeph>TRUNCATE()</codeph>
+          </li>
+
+          <li>
+            <codeph>ZEROIFNULL()</codeph>
+          </li>
+        </ul>
+        See <xref href="impala_functions.xml#builtins"/> for details.
+      </li>
+
+      <li>
+        <p>
+          <codeph>BIGINT</codeph>, <codeph>INT</codeph>, 
<codeph>SMALLINT</codeph>, and <codeph>TINYINT</codeph>
+          values can all be cast to <codeph>DECIMAL</codeph>. The number of 
digits to the left of the decimal point
+          in the <codeph>DECIMAL</codeph> type must be sufficient to hold the 
largest value of the corresponding
+          integer type. Note that integer literals are treated as the smallest 
appropriate integer type, meaning
+          there is sometimes a range of values that require one more digit of 
<codeph>DECIMAL</codeph> scale than
+          you might expect. For integer values, the precision of the 
<codeph>DECIMAL</codeph> type can be zero; if
+          the precision is greater than zero, remember to increase the scale 
value by an equivalent amount to hold
+          the required number of digits to the left of the decimal point.
+        </p>
+        <p>
+          The following examples show how different integer types are 
converted to <codeph>DECIMAL</codeph>.
+        </p>
+<!-- According to Nong, it's a bug that so many integer digits can be 
converted to a DECIMAL
+     value with small (s,p) spec. So expect to re-do this example. -->
+<codeblock>[localhost:21000] &gt; select cast(1 as decimal(1,0));
++-------------------------+
+| cast(1 as decimal(1,0)) |
++-------------------------+
+| 1                       |
++-------------------------+
+[localhost:21000] &gt; select cast(9 as decimal(1,0));
++-------------------------+
+| cast(9 as decimal(1,0)) |
++-------------------------+
+| 9                       |
++-------------------------+
+[localhost:21000] &gt; select cast(10 as decimal(1,0));
++--------------------------+
+| cast(10 as decimal(1,0)) |
++--------------------------+
+| 10                       |
++--------------------------+
+[localhost:21000] &gt; select cast(10 as decimal(1,1));
++--------------------------+
+| cast(10 as decimal(1,1)) |
++--------------------------+
+| 10.0                     |
++--------------------------+
+[localhost:21000] &gt; select cast(100 as decimal(1,1));
++---------------------------+
+| cast(100 as decimal(1,1)) |
++---------------------------+
+| 100.0                     |
++---------------------------+
+[localhost:21000] &gt; select cast(1000 as decimal(1,1));
++----------------------------+
+| cast(1000 as decimal(1,1)) |
++----------------------------+
+| 1000.0                     |
++----------------------------+
+</codeblock>
+      </li>
+
+      <li>
+        <p>
+          When a <codeph>DECIMAL</codeph> value is converted to any of the 
integer types, any fractional part is
+          truncated (that is, rounded towards zero):
+        </p>
+<codeblock>[localhost:21000] &gt; create table num_dec_days (x decimal(4,1));
+[localhost:21000] &gt; insert into num_dec_days values (1), (2), (cast(4.5 as 
decimal(4,1)));
+[localhost:21000] &gt; insert into num_dec_days values (cast(0.1 as 
decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), 
(cast(9.9 as decimal(4,1)));
+[localhost:21000] &gt; select cast(x as int) from num_dec_days;
++----------------+
+| cast(x as int) |
++----------------+
+| 1              |
+| 2              |
+| 4              |
+| 0              |
+| 0              |
+| 9              |
+| 9              |
++----------------+
+</codeblock>
+      </li>
+
+      <li>
+        <p>
+          You cannot directly cast <codeph>TIMESTAMP</codeph> or 
<codeph>BOOLEAN</codeph> values to or from
+          <codeph>DECIMAL</codeph> values. You can turn a 
<codeph>DECIMAL</codeph> value into a time-related
+          representation using a two-step process, by converting it to an 
integer value and then using that result
+          in a call to a date and time function such as 
<codeph>from_unixtime()</codeph>.
+        </p>
+<codeblock>[localhost:21000] &gt; select from_unixtime(cast(cast(1000.0 as 
decimal) as bigint));
++-------------------------------------------------------------+
+| from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint)) |
++-------------------------------------------------------------+
+| 1970-01-01 00:16:40                                         |
++-------------------------------------------------------------+
+[localhost:21000] &gt; select now() + interval cast(x as int) days from 
num_dec_days; -- x is a DECIMAL column.
+
+[localhost:21000] &gt; create table num_dec_days (x decimal(4,1));
+[localhost:21000] &gt; insert into num_dec_days values (1), (2), (cast(4.5 as 
decimal(4,1)));
+[localhost:21000] &gt; select now() + interval cast(x as int) days from 
num_dec_days; -- The 4.5 value is truncated to 4 and becomes '4 days'.
++--------------------------------------+
+| now() + interval cast(x as int) days |
++--------------------------------------+
+| 2014-05-13 23:11:55.163284000        |
+| 2014-05-14 23:11:55.163284000        |
+| 2014-05-16 23:11:55.163284000        |
++--------------------------------------+
+</codeblock>
+      </li>
+
+      <li>
+        <p>
+          Because values in <codeph>INSERT</codeph> statements are checked 
rigorously for type compatibility, be
+          prepared to use <codeph>CAST()</codeph> function calls around 
literals, column references, or other
+          expressions that you are inserting into a <codeph>DECIMAL</codeph> 
column.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
+
+    <p>
+      <b>DECIMAL differences from integer and floating-point types:</b>
+    </p>
+
+    <p>
+      With the <codeph>DECIMAL</codeph> type, you are concerned with the 
number of overall digits of a number
+      rather than powers of 2 (as in <codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>, and so on). Therefore,
+      the limits with integral values of <codeph>DECIMAL</codeph> types fall 
around 99, 999, 9999, and so on rather
+      than 32767, 65535, 2
+      <sup>32</sup>
+      -1, and so on. For fractional values, you do not need to account for 
imprecise representation of the
+      fractional part according to the IEEE-954 standard (as in 
<codeph>FLOAT</codeph> and
+      <codeph>DOUBLE</codeph>). Therefore, when you insert a fractional value 
into a <codeph>DECIMAL</codeph>
+      column, you can compare, sum, query, <codeph>GROUP BY</codeph>, and so 
on that column and get back the
+      original values rather than some <q>close but not identical</q> value.
+    </p>
+
+    <p>
+      <codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> can cause problems or 
unexpected behavior due to inability
+      to precisely represent certain fractional values, for example dollar and 
cents values for currency. You might
+      find output values slightly different than you inserted, equality tests 
that do not match precisely, or
+      unexpected values for <codeph>GROUP BY</codeph> columns. 
<codeph>DECIMAL</codeph> can help reduce unexpected
+      behavior and rounding errors, at the expense of some performance 
overhead for assignments and comparisons.
+    </p>
+
+    <p>
+      <b>Literals and expressions:</b>
+      <ul>
+        <li>
+          <p>
+            When you use an integer literal such as <codeph>1</codeph> or 
<codeph>999</codeph> in a SQL statement,
+            depending on the context, Impala will treat it as either the 
smallest appropriate
+            <codeph>DECIMAL</codeph> type, or the smallest integer type 
(<codeph>TINYINT</codeph>,
+            <codeph>SMALLINT</codeph>, <codeph>INT</codeph>, or 
<codeph>BIGINT</codeph>). To minimize memory usage,
+            Impala prefers to treat the literal as the smallest appropriate 
integer type.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you use a floating-point literal such as <codeph>1.1</codeph> 
or <codeph>999.44</codeph> in a SQL
+            statement, depending on the context, Impala will treat it as 
either the smallest appropriate
+            <codeph>DECIMAL</codeph> type, or the smallest floating-point type 
(<codeph>FLOAT</codeph> or
+            <codeph>DOUBLE</codeph>). To avoid loss of accuracy, Impala 
prefers to treat the literal as a
+            <codeph>DECIMAL</codeph>.
+          </p>
+        </li>
+      </ul>
+    </p>
+
+    <p>
+      <b>Storage considerations:</b>
+    </p>
+
+    <ul>
+      <li>
+        Only the precision determines the storage size for 
<codeph>DECIMAL</codeph> values; the scale setting has
+        no effect on the storage size.
+      </li>
+
+      <li>
+        Text, RCFile, and SequenceFile tables all use ASCII-based formats. In 
these text-based file formats,
+        leading zeros are not stored, but trailing zeros are stored. In these 
tables, each <codeph>DECIMAL</codeph>
+        value takes up as many bytes as there are digits in the value, plus an 
extra byte if the decimal point is
+        present and an extra byte for negative values. Once the values are 
loaded into memory, they are represented
+        in 4, 8, or 16 bytes as described in the following list items. The 
on-disk representation varies depending
+        on the file format of the table.
+      </li>
+
+<!-- Next couple of points can be conref'ed with identical list bullets 
farther down under File Format Considerations. -->
+
+      <li>
+        Parquet and Avro tables use binary formats, In these tables, Impala 
stores each value in as few bytes as
+        possible
+<!-- 4, 8, or 16 bytes -->
+        depending on the precision specified for the <codeph>DECIMAL</codeph> 
column.
+        <ul>
+          <li>
+            In memory, <codeph>DECIMAL</codeph> values with precision of 9 or 
less are stored in 4 bytes.
+          </li>
+
+          <li>
+            In memory, <codeph>DECIMAL</codeph> values with precision of 10 
through 18 are stored in 8 bytes.
+          </li>
+
+          <li>
+            In memory, <codeph>DECIMAL</codeph> values with precision greater 
than 18 are stored in 16 bytes.
+          </li>
+        </ul>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/file_format_blurb"/>
+
+    <ul>
+      <li>
+        The <codeph>DECIMAL</codeph> data type can be stored in any of the 
file formats supported by Impala, as
+        described in <xref href="impala_file_formats.xml#file_formats"/>. 
Impala only writes to tables that use the
+        Parquet and text formats, so those formats are the focus for file 
format compatibility.
+      </li>
+
+      <li>
+        Impala can query Avro, RCFile, or SequenceFile tables containing 
<codeph>DECIMAL</codeph> columns, created
+        by other Hadoop components, on CDH 5 only.
+      </li>
+
+      <li>
+        You can use <codeph>DECIMAL</codeph> columns in Impala tables that are 
mapped to HBase tables. Impala can
+        query and insert into such tables.
+      </li>
+
+      <li>
+        Text, RCFile, and SequenceFile tables all use ASCII-based formats. In 
these tables, each
+        <codeph>DECIMAL</codeph> value takes up as many bytes as there are 
digits in the value, plus an extra byte
+        if the decimal point is present. The binary format of Parquet or Avro 
files offers more compact storage for
+        <codeph>DECIMAL</codeph> columns.
+      </li>
+
+      <li>
+        Parquet and Avro tables use binary formats, In these tables, Impala 
stores each value in 4, 8, or 16 bytes
+        depending on the precision specified for the <codeph>DECIMAL</codeph> 
column.
+      </li>
+
+      <li>
+        Parquet files containing <codeph>DECIMAL</codeph> columns are not 
expected to be readable under CDH 4. See
+        the <b>Compatibility</b> section for details.
+      </li>
+    </ul>
+
+    <p>
+      <b>UDF considerations:</b> When writing a C++ UDF, use the 
<codeph>DecimalVal</codeph> data type defined in
+      <filepath>/usr/include/impala_udf/udf.h</filepath>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/partitioning_blurb"/>
+
+    <p>
+      You can use a <codeph>DECIMAL</codeph> column as a partition key. Doing 
so provides a better match between
+      the partition key values and the HDFS directory names than using a 
<codeph>DOUBLE</codeph> or
+      <codeph>FLOAT</codeph> partitioning column:
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/schema_evolution_blurb"/>
+
+    <ul>
+      <li>
+        For text-based formats (text, RCFile, and SequenceFile tables), you 
can issue an <codeph>ALTER TABLE ...
+        REPLACE COLUMNS</codeph> statement to change the precision and scale 
of an existing
+        <codeph>DECIMAL</codeph> column. As long as the values in the column 
fit within the new precision and
+        scale, they are returned correctly by a query. Any values that do not 
fit within the new precision and
+        scale are returned as <codeph>NULL</codeph>, and Impala reports the 
conversion error. Leading zeros do not
+        count against the precision value, but trailing zeros after the 
decimal point do.
+<codeblock>[localhost:21000] &gt; create table text_decimals (x string);
+[localhost:21000] &gt; insert into text_decimals values ("1"), ("2"), 
("99.99"), ("1.234"), ("000001"), ("1.000000000");
+[localhost:21000] &gt; select * from text_decimals;
++-------------+
+| x           |
++-------------+
+| 1           |
+| 2           |
+| 99.99       |
+| 1.234       |
+| 000001      |
+| 1.000000000 |
++-------------+
+[localhost:21000] &gt; alter table text_decimals replace columns (x 
decimal(4,2));
+[localhost:21000] &gt; select * from text_decimals;
++-------+
+| x     |
++-------+
+| 1.00  |
+| 2.00  |
+| 99.99 |
+| NULL  |
+| 1.00  |
+| NULL  |
++-------+
+ERRORS:
+Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234)
+file: 
hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0
+e8420-b4b13bab7f1be787_56794587_data.0
+record: 1.234
+Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000)
+file: 
hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20
+c565a-cc4bd86c724c96ba_311873428_data.0
+record: 1.000000000
+</codeblock>
+      </li>
+
+      <li>
+        For binary formats (Parquet and Avro tables), although an 
<codeph>ALTER TABLE ... REPLACE COLUMNS</codeph>
+        statement that changes the precision or scale of a 
<codeph>DECIMAL</codeph> column succeeds, any subsequent
+        attempt to query the changed column results in a fatal error. (The 
other columns can still be queried
+        successfully.) This is because the metadata about the columns is 
stored in the data files themselves, and
+        <codeph>ALTER TABLE</codeph> does not actually make any updates to the 
data files. If the metadata in the
+        data files disagrees with the metadata in the metastore database, 
Impala cancels the query.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
+INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
+SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
+SELECT CAST(1000.5 AS DECIMAL);
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/decimal_no_stats"/>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/partitioning_good"/> 
-->
+
+    <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/parquet_ok"/>
+
+    <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!-- <p 
conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/> -->
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/internals_blurb"/> 
-->
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/added_in_20"/> -->
+
+    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_literals.xml#numeric_literals"/>, <xref 
href="impala_tinyint.xml#tinyint"/>,
+      <xref href="impala_smallint.xml#smallint"/>, <xref 
href="impala_int.xml#int"/>,
+      <xref href="impala_bigint.xml#bigint"/>, <xref 
href="impala_decimal.xml#decimal"/>,
+      <xref href="impala_math_functions.xml#math_functions"/> (especially 
<codeph>PRECISION()</codeph> and
+      <codeph>SCALE()</codeph>)
+    </p>
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_default_order_by_limit.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_default_order_by_limit.xml 
b/docs/topics/impala_default_order_by_limit.xml
new file mode 100644
index 0000000..def0335
--- /dev/null
+++ b/docs/topics/impala_default_order_by_limit.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="obwl" id="default_order_by_limit">
+
+  <title>DEFAULT_ORDER_BY_LIMIT Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p conref="../shared/impala_common.xml#common/obwl_query_options"/>
+
+    <p rev="1.4.0">
+      Prior to Impala 1.4.0, Impala queries that use the <codeph><xref 
href="impala_order_by.xml#order_by">ORDER
+      BY</xref></codeph> clause must also include a
+      <codeph><xref href="impala_limit.xml#limit">LIMIT</xref></codeph> 
clause, to avoid accidentally producing
+      huge result sets that must be sorted. Sorting a huge result set is a 
memory-intensive operation. In Impala
+      1.4.0 and higher, Impala uses a temporary disk work area to perform the 
sort if that operation would
+      otherwise exceed the Impala memory limit on a particular host.
+    </p>
+
+    <p>
+      <b>Type: numeric</b>
+    </p>
+
+    <p>
+      <b>Default:</b> -1 (no default limit)
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_delete.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_delete.xml b/docs/topics/impala_delete.xml
new file mode 100644
index 0000000..fcac5e4
--- /dev/null
+++ b/docs/topics/impala_delete.xml
@@ -0,0 +1,64 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="delete">
+
+  <title>DELETE Statement (CDH 5.5 and higher only)</title>
+  <titlealts><navtitle>DELETE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Kudu"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="DML"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">DELETE statement</indexterm>
+      Deletes one or more rows from a Kudu table.
+      Although deleting a single row or a range of rows would be inefficient 
for tables using HDFS
+      data files, Kudu is able to perform this operation efficiently. 
Therefore, this statement
+      only works for Impala tables that use the Kudu storage engine.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>
+</codeblock>
+
+    <p rev="kudu" audience="impala_next">
+      Normally, a <codeph>DELETE</codeph> operation for a Kudu table fails if
+      some partition key columns are not found, due to their being deleted or 
changed
+      by a concurrent <codeph>UPDATE</codeph> or <codeph>DELETE</codeph> 
operation.
+      Specify <codeph>DELETE IGNORE 
<varname>rest_of_statement</varname></codeph> to
+      make the <codeph>DELETE</codeph> continue in this case. The rows with 
the nonexistent
+      duplicate partition key column values are not removed.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/dml_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+<codeblock>
+
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_kudu.xml#impala_kudu"/>
+    </p>
+
+  </conbody>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_describe.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_describe.xml b/docs/topics/impala_describe.xml
new file mode 100644
index 0000000..ffdb505
--- /dev/null
+++ b/docs/topics/impala_describe.xml
@@ -0,0 +1,561 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="describe">
+
+  <title id="desc">DESCRIBE Statement</title>
+  <titlealts><navtitle>DESCRIBE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">DESCRIBE statement</indexterm>
+      The <codeph>DESCRIBE</codeph> statement displays metadata about a table, 
such as the column names and their
+      data types. Its syntax is:
+    </p>
+
+<codeblock rev="2.3.0">DESCRIBE [FORMATTED] 
[<varname>db_name</varname>.]<varname>table_name</varname>[.<varname>complex_col_name</varname>
 ...]</codeblock>
+
+    <p>
+      You can use the abbreviation <codeph>DESC</codeph> for the 
<codeph>DESCRIBE</codeph> statement.
+    </p>
+
+    <p rev="1.1">
+      The <codeph>DESCRIBE FORMATTED</codeph> variation displays additional 
information, in a format familiar to
+      users of Apache Hive. The extra information includes low-level details 
such as whether the table is internal
+      or external, when it was created, the file format, the location of the 
data in HDFS, whether the object is a
+      table or a view, and (for views) the text of the query from the view 
definition.
+    </p>
+
+    <note>
+      The <codeph>Compressed</codeph> field is not a reliable indicator of 
whether the table contains compressed
+      data. It typically always shows <codeph>No</codeph>, because the 
compression settings only apply during the
+      session that loads data and are not stored persistently with the table 
metadata.
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      For the <codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and 
<codeph>MAP</codeph> types available in
+      CDH 5.5 / Impala 2.3 and higher, the <codeph>DESCRIBE</codeph> output is 
formatted to avoid
+      excessively long lines for multiple fields within a 
<codeph>STRUCT</codeph>, or a nested sequence of
+      complex types.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_describe"/>
+
+    <p rev="2.3.0">
+      For example, here is the <codeph>DESCRIBE</codeph> output for a table 
containing a single top-level column
+      of each complex type:
+    </p>
+
+<codeblock rev="2.3.0"><![CDATA[create table t1 (x int, a array<int>, s 
struct<f1: string, f2: bigint>, m map<string,int>) stored as parquet;
+
+describe t1;
++------+-----------------+---------+
+| name | type            | comment |
++------+-----------------+---------+
+| x    | int             |         |
+| a    | array<int>      |         |
+| s    | struct<         |         |
+|      |   f1:string,    |         |
+|      |   f2:bigint     |         |
+|      | >               |         |
+| m    | map<string,int> |         |
++------+-----------------+---------+
+]]>
+</codeblock>
+
+    <p rev="2.3.0">
+      Here are examples showing how to <q>drill down</q> into the layouts of 
complex types, including
+      using multi-part names to examine the definitions of nested types.
+      The <codeph>&lt; &gt;</codeph> delimiters identify the columns with 
complex types;
+      these are the columns where you can descend another level to see the 
parts that make up
+      the complex type.
+      This technique helps you to understand the multi-part names you use as 
table references in queries
+      involving complex types, and the corresponding column names you refer to 
in the <codeph>SELECT</codeph> list.
+      These tables are from the <q>nested TPC-H</q> schema, shown in detail in
+      <xref href="impala_complex_types.xml#complex_sample_schema"/>.
+    </p>
+
+    <p>
+      The <codeph>REGION</codeph> table contains an <codeph>ARRAY</codeph> of 
<codeph>STRUCT</codeph>
+      elements:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          The first <codeph>DESCRIBE</codeph> specifies the table name, to 
display the definition
+          of each top-level column.
+        </p>
+      </li>
+      <li>
+        <p>
+          The second <codeph>DESCRIBE</codeph> specifies the name of a complex
+          column, <codeph>REGION.R_NATIONS</codeph>, showing that when you 
include the name of an <codeph>ARRAY</codeph>
+          column in a <codeph>FROM</codeph> clause, that table reference acts 
like a two-column table with
+          columns <codeph>ITEM</codeph> and <codeph>POS</codeph>.
+        </p>
+      </li>
+      <li>
+        <p>
+          The final <codeph>DESCRIBE</codeph> specifies the fully qualified 
name of the <codeph>ITEM</codeph> field,
+          to display the layout of its underlying <codeph>STRUCT</codeph> type 
in table format, with the fields
+          mapped to column names.
+        </p>
+      </li>
+    </ul>
+      
+<codeblock rev="2.3.0"><![CDATA[
+-- #1: The overall layout of the entire table.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array<struct<           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | >>                      |         |
++-------------+-------------------------+---------+
+
+-- #2: The ARRAY column within the table.
+describe region.r_nations;
++------+-------------------------+---------+
+| name | type                    | comment |
++------+-------------------------+---------+
+| item | struct<                 |         |
+|      |   n_nationkey:smallint, |         |
+|      |   n_name:string,        |         |
+|      |   n_comment:string      |         |
+|      | >                       |         |
+| pos  | bigint                  |         |
++------+-------------------------+---------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe region.r_nations.item;
++-------------+----------+---------+
+| name        | type     | comment |
++-------------+----------+---------+
+| n_nationkey | smallint |         |
+| n_name      | string   |         |
+| n_comment   | string   |         |
++-------------+----------+---------+
+]]>
+</codeblock>
+
+    <p>
+      The <codeph>CUSTOMER</codeph> table contains an <codeph>ARRAY</codeph> 
of <codeph>STRUCT</codeph>
+      elements, where one field in the <codeph>STRUCT</codeph> is another 
<codeph>ARRAY</codeph> of
+      <codeph>STRUCT</codeph> elements:
+    </p>
+    <ul>
+      <li>
+        <p>
+          Again, the initial <codeph>DESCRIBE</codeph> specifies only the 
table name.
+        </p>
+      </li>
+      <li>
+        <p>
+          The second <codeph>DESCRIBE</codeph> specifies the qualified name of 
the complex
+          column, <codeph>CUSTOMER.C_ORDERS</codeph>, showing how an 
<codeph>ARRAY</codeph>
+          is represented as a two-column table with columns 
<codeph>ITEM</codeph> and <codeph>POS</codeph>.
+        </p>
+      </li>
+      <li>
+        <p>
+          The third <codeph>DESCRIBE</codeph> specifies the qualified name of 
the <codeph>ITEM</codeph>
+          of the <codeph>ARRAY</codeph> column, to see the structure of the 
nested <codeph>ARRAY</codeph>.
+          Again, it has has two parts, <codeph>ITEM</codeph> and 
<codeph>POS</codeph>. Because the
+          <codeph>ARRAY</codeph> contains a <codeph>STRUCT</codeph>, the 
layout of the <codeph>STRUCT</codeph>
+          is shown.
+        </p>
+      </li>
+      <li>
+        <p>
+          The fourth and fifth <codeph>DESCRIBE</codeph> statements drill down 
into a <codeph>STRUCT</codeph> field that
+          is itself a complex type, an <codeph>ARRAY</codeph> of 
<codeph>STRUCT</codeph>.
+          The <codeph>ITEM</codeph> portion of the qualified name is only 
required when the <codeph>ARRAY</codeph>
+          elements are anonymous. The fields of the <codeph>STRUCT</codeph> 
give names to any other complex types
+          nested inside the <codeph>STRUCT</codeph>. Therefore, the 
<codeph>DESCRIBE</codeph> parameters
+          <codeph>CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</codeph> and 
<codeph>CUSTOMER.C_ORDERS.O_LINEITEMS</codeph>
+          are equivalent. (For brevity, Cloudera recommends leaving out the 
<codeph>ITEM</codeph> portion of
+          a qualified name when it is not required.)
+        </p>
+      </li>
+      <li>
+        <p>
+          The final <codeph>DESCRIBE</codeph> shows the layout of the deeply 
nested <codeph>STRUCT</codeph> type.
+          Because there are no more complex types nested inside this 
<codeph>STRUCT</codeph>, this is as far
+          as you can drill down into the layout for this table.
+        </p>
+      </li>
+    </ul>
+
+<codeblock rev="2.3.0"><![CDATA[-- #1: The overall layout of the entire table.
+describe customer;
++--------------+------------------------------------+
+| name         | type                               |
++--------------+------------------------------------+
+| c_custkey    | bigint                             |
+... more scalar columns ...
+| c_orders     | array<struct<                      |
+|              |   o_orderkey:bigint,               |
+|              |   o_orderstatus:string,            |
+|              |   o_totalprice:decimal(12,2),      |
+|              |   o_orderdate:string,              |
+|              |   o_orderpriority:string,          |
+|              |   o_clerk:string,                  |
+|              |   o_shippriority:int,              |
+|              |   o_comment:string,                |
+|              |   o_lineitems:array<struct<        |
+|              |     l_partkey:bigint,              |
+|              |     l_suppkey:bigint,              |
+|              |     l_linenumber:int,              |
+|              |     l_quantity:decimal(12,2),      |
+|              |     l_extendedprice:decimal(12,2), |
+|              |     l_discount:decimal(12,2),      |
+|              |     l_tax:decimal(12,2),           |
+|              |     l_returnflag:string,           |
+|              |     l_linestatus:string,           |
+|              |     l_shipdate:string,             |
+|              |     l_commitdate:string,           |
+|              |     l_receiptdate:string,          |
+|              |     l_shipinstruct:string,         |
+|              |     l_shipmode:string,             |
+|              |     l_comment:string               |
+|              |   >>                               |
+|              | >>                                 |
++--------------+------------------------------------+
+
+-- #2: The ARRAY column within the table.
+describe customer.c_orders;
++------+------------------------------------+
+| name | type                               |
++------+------------------------------------+
+| item | struct<                            |
+|      |   o_orderkey:bigint,               |
+|      |   o_orderstatus:string,            |
+... more struct fields ...
+|      |   o_lineitems:array<struct<        |
+|      |     l_partkey:bigint,              |
+|      |     l_suppkey:bigint,              |
+... more nested struct fields ...
+|      |     l_comment:string               |
+|      |   >>                               |
+|      | >                                  |
+| pos  | bigint                             |
++------+------------------------------------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe customer.c_orders.item;
++-----------------+----------------------------------+
+| name            | type                             |
++-----------------+----------------------------------+
+| o_orderkey      | bigint                           |
+| o_orderstatus   | string                           |
+| o_totalprice    | decimal(12,2)                    |
+| o_orderdate     | string                           |
+| o_orderpriority | string                           |
+| o_clerk         | string                           |
+| o_shippriority  | int                              |
+| o_comment       | string                           |
+| o_lineitems     | array<struct<                    |
+|                 |   l_partkey:bigint,              |
+|                 |   l_suppkey:bigint,              |
+... more struct fields ...
+|                 |   l_comment:string               |
+|                 | >>                               |
++-----------------+----------------------------------+
+
+-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY.
+describe customer.c_orders.item.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct<                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | >                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the 
name
+--     because O_LINEITEMS and other field names provide a way to refer to 
things
+--     inside the ARRAY element.
+describe customer.c_orders.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct<                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | >                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #6: The STRUCT representing ARRAY elements nested inside
+--     another ARRAY of STRUCTs. The lack of any complex types
+--     in this output means this is as far as DESCRIBE can
+--     descend into the table layout.
+describe customer.c_orders.o_lineitems.item;
++-----------------+---------------+
+| name            | type          |
++-----------------+---------------+
+| l_partkey       | bigint        |
+| l_suppkey       | bigint        |
+... more scalar columns ...
+| l_comment       | string        |
++-----------------+---------------+
+]]>
+</codeblock>
+
+<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+<p>
+  After the <cmdname>impalad</cmdname> daemons are restarted, the first query 
against a table can take longer
+  than subsequent queries, because the metadata for the table is loaded before 
the query is processed. This
+  one-time delay for each table can cause misleading results in benchmark 
tests or cause unnecessary concern.
+  To <q>warm up</q> the Impala metadata cache, you can issue a 
<codeph>DESCRIBE</codeph> statement in advance
+  for each table you intend to access later.
+</p>
+
+<p>
+  When you are dealing with data files stored in HDFS, sometimes it is 
important to know details such as the
+  path of the data files for an Impala table, and the host name for the 
namenode. You can get this information
+  from the <codeph>DESCRIBE FORMATTED</codeph> output. You specify HDFS URIs 
or path specifications with
+  statements such as <codeph>LOAD DATA</codeph> and the 
<codeph>LOCATION</codeph> clause of <codeph>CREATE
+  TABLE</codeph> or <codeph>ALTER TABLE</codeph>. You might also use HDFS URIs 
or paths with Linux commands
+  such as <cmdname>hadoop</cmdname> and <cmdname>hdfs</cmdname> to copy, 
rename, and so on, data files in HDFS.
+</p>
+
+<p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+<p rev="1.2.1">
+  Each table can also have associated table statistics and column statistics. 
To see these categories of
+  information, use the <codeph>SHOW TABLE STATS 
<varname>table_name</varname></codeph> and <codeph>SHOW COLUMN
+  STATS <varname>table_name</varname></codeph> statements.
+<!--
+For example, the table statistics can often show you the number
+and total size of the files in the table, even if you have not
+run <codeph>COMPUTE STATS</codeph>.
+-->
+  See <xref href="impala_show.xml#show"/> for details.
+</p>
+
+<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+<p rev="2.3.0">
+  Because the column definitions for complex types can become long, 
particularly when such types are nested,
+  the <codeph>DESCRIBE</codeph> statement uses special formatting for complex 
type columns to make the output readable.
+</p>
+
+<note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<p>
+  The following example shows the results of both a standard 
<codeph>DESCRIBE</codeph> and <codeph>DESCRIBE
+  FORMATTED</codeph> for different kinds of schema objects:
+</p>
+
+<ul>
+  <li>
+        <codeph>DESCRIBE</codeph> for a table or a view returns the name, 
type, and comment for each of the
+        columns. For a view, if the column value is computed by an expression, 
the column name is automatically
+        generated as <codeph>_c0</codeph>, <codeph>_c1</codeph>, and so on 
depending on the ordinal number of the
+        column.
+      </li>
+
+      <li>
+        A table created with no special format or storage clauses is 
designated as a <codeph>MANAGED_TABLE</codeph>
+        (an <q>internal table</q> in Impala terminology). Its data files are 
stored in an HDFS directory under the
+        default Hive data directory. By default, it uses Text data format.
+      </li>
+
+      <li>
+        A view is designated as <codeph>VIRTUAL_VIEW</codeph> in 
<codeph>DESCRIBE FORMATTED</codeph> output. Some
+        of its properties are <codeph>NULL</codeph> or blank because they are 
inherited from the base table. The
+        text of the query that defines the view is part of the 
<codeph>DESCRIBE FORMATTED</codeph> output.
+      </li>
+
+      <li>
+        A table with additional clauses in the <codeph>CREATE TABLE</codeph> 
statement has differences in
+        <codeph>DESCRIBE FORMATTED</codeph> output. The output for 
<codeph>T2</codeph> includes the
+        <codeph>EXTERNAL_TABLE</codeph> keyword because of the <codeph>CREATE 
EXTERNAL TABLE</codeph> syntax, and
+        different <codeph>InputFormat</codeph> and 
<codeph>OutputFormat</codeph> fields to reflect the Parquet file
+        format.
+      </li>
+    </ul>
+
+<codeblock>[localhost:21000] &gt; create table t1 (x int, y int, s string);
+Query: create table t1 (x int, y int, s string)
+[localhost:21000] &gt; describe t1;
+Query: describe t1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| y    | int    |         |
+| s    | string |         |
++------+--------+---------+
+Returned 3 row(s) in 0.13s
+[localhost:21000] &gt; describe formatted t1;
+Query: describe formatted t1
+Query finished, fetching results ...
++------------------------------+--------------------------------------------------------------------+----------------------+
+| name                         | type                                          
                     | comment              |
++------------------------------+--------------------------------------------------------------------+----------------------+
+| # col_name                   | data_type                                     
                     | comment              |
+|                              | NULL                                          
                     | NULL                 |
+| x                            | int                                           
                     | None                 |
+| y                            | int                                           
                     | None                 |
+| s                            | string                                        
                     | None                 |
+|                              | NULL                                          
                     | NULL                 |
+| # Detailed Table Information | NULL                                          
                     | NULL                 |
+| Database:                    | describe_formatted                            
                     | NULL                 |
+| Owner:                       | cloudera                                      
                     | NULL                 |
+| CreateTime:                  | Mon Jul 22 17:03:16 EDT 2013                  
                     | NULL                 |
+| LastAccessTime:              | UNKNOWN                                       
                     | NULL                 |
+| Protect Mode:                | None                                          
                     | NULL                 |
+| Retention:                   | 0                                             
                     | NULL                 |
+| Location:                    | 
hdfs://127.0.0.1:8020/user/hive/warehouse/describe_formatted.db/t1 | NULL       
          |
+| Table Type:                  | MANAGED_TABLE                                 
                     | NULL                 |
+| Table Parameters:            | NULL                                          
                     | NULL                 |
+|                              | transient_lastDdlTime                         
                     | 1374526996           |
+|                              | NULL                                          
                     | NULL                 |
+| # Storage Information        | NULL                                          
                     | NULL                 |
+| SerDe Library:               | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                 | NULL       
          |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat      
                     | NULL                 |
+| OutputFormat:                | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat         | NULL       
          |
+| Compressed:                  | No                                            
                     | NULL                 |
+| Num Buckets:                 | 0                                             
                     | NULL                 |
+| Bucket Columns:              | []                                            
                     | NULL                 |
+| Sort Columns:                | []                                            
                     | NULL                 |
++------------------------------+--------------------------------------------------------------------+----------------------+
+Returned 26 row(s) in 0.03s
+[localhost:21000] &gt; create view v1 as select x, upper(s) from t1;
+Query: create view v1 as select x, upper(s) from t1
+[localhost:21000] &gt; describe v1;
+Query: describe v1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| _c1  | string |         |
++------+--------+---------+
+Returned 2 row(s) in 0.10s
+[localhost:21000] &gt; describe formatted v1;
+Query: describe formatted v1
+Query finished, fetching results ...
++------------------------------+------------------------------+----------------------+
+| name                         | type                         | comment        
      |
++------------------------------+------------------------------+----------------------+
+| # col_name                   | data_type                    | comment        
      |
+|                              | NULL                         | NULL           
      |
+| x                            | int                          | None           
      |
+| _c1                          | string                       | None           
      |
+|                              | NULL                         | NULL           
      |
+| # Detailed Table Information | NULL                         | NULL           
      |
+| Database:                    | describe_formatted           | NULL           
      |
+| Owner:                       | cloudera                     | NULL           
      |
+| CreateTime:                  | Mon Jul 22 16:56:38 EDT 2013 | NULL           
      |
+| LastAccessTime:              | UNKNOWN                      | NULL           
      |
+| Protect Mode:                | None                         | NULL           
      |
+| Retention:                   | 0                            | NULL           
      |
+| Table Type:                  | VIRTUAL_VIEW                 | NULL           
      |
+| Table Parameters:            | NULL                         | NULL           
      |
+|                              | transient_lastDdlTime        | 1374526598     
      |
+|                              | NULL                         | NULL           
      |
+| # Storage Information        | NULL                         | NULL           
      |
+| SerDe Library:               | null                         | NULL           
      |
+| InputFormat:                 | null                         | NULL           
      |
+| OutputFormat:                | null                         | NULL           
      |
+| Compressed:                  | No                           | NULL           
      |
+| Num Buckets:                 | 0                            | NULL           
      |
+| Bucket Columns:              | []                           | NULL           
      |
+| Sort Columns:                | []                           | NULL           
      |
+|                              | NULL                         | NULL           
      |
+| # View Information           | NULL                         | NULL           
      |
+| View Original Text:          | SELECT x, upper(s) FROM t1   | NULL           
      |
+| View Expanded Text:          | SELECT x, upper(s) FROM t1   | NULL           
      |
++------------------------------+------------------------------+----------------------+
+Returned 28 row(s) in 0.03s
+[localhost:21000] &gt; create external table t2 (x int, y int, s string) 
stored as parquet location '/user/cloudera/sample_data';
+[localhost:21000] &gt; describe formatted t2;
+Query: describe formatted t2
+Query finished, fetching results ...
++------------------------------+----------------------------------------------------+----------------------+
+| name                         | type                                          
     | comment              |
++------------------------------+----------------------------------------------------+----------------------+
+| # col_name                   | data_type                                     
     | comment              |
+|                              | NULL                                          
     | NULL                 |
+| x                            | int                                           
     | None                 |
+| y                            | int                                           
     | None                 |
+| s                            | string                                        
     | None                 |
+|                              | NULL                                          
     | NULL                 |
+| # Detailed Table Information | NULL                                          
     | NULL                 |
+| Database:                    | describe_formatted                            
     | NULL                 |
+| Owner:                       | cloudera                                      
     | NULL                 |
+| CreateTime:                  | Mon Jul 22 17:01:47 EDT 2013                  
     | NULL                 |
+| LastAccessTime:              | UNKNOWN                                       
     | NULL                 |
+| Protect Mode:                | None                                          
     | NULL                 |
+| Retention:                   | 0                                             
     | NULL                 |
+| Location:                    | 
hdfs://127.0.0.1:8020/user/cloudera/sample_data    | NULL                 |
+| Table Type:                  | EXTERNAL_TABLE                                
     | NULL                 |
+| Table Parameters:            | NULL                                          
     | NULL                 |
+|                              | EXTERNAL                                      
     | TRUE                 |
+|                              | transient_lastDdlTime                         
     | 1374526907           |
+|                              | NULL                                          
     | NULL                 |
+| # Storage Information        | NULL                                          
     | NULL                 |
+| SerDe Library:               | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                 |
+| InputFormat:                 | 
com.cloudera.impala.hive.serde.ParquetInputFormat  | NULL                 |
+| OutputFormat:                | 
com.cloudera.impala.hive.serde.ParquetOutputFormat | NULL                 |
+| Compressed:                  | No                                            
     | NULL                 |
+| Num Buckets:                 | 0                                             
     | NULL                 |
+| Bucket Columns:              | []                                            
     | NULL                 |
+| Sort Columns:                | []                                            
     | NULL                 |
++------------------------------+----------------------------------------------------+----------------------+
+Returned 27 row(s) in 0.17s</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have read and execute
+      permissions for all directories that are part of the table.
+      (A table could span multiple different HDFS directories if it is 
partitioned.
+      The directories could be widely scattered because a partition can reside
+      in an arbitrary HDFS directory based on its <codeph>LOCATION</codeph> 
attribute.)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_tables.xml#tables"/>, <xref 
href="impala_create_table.xml#create_table"/>,
+      <xref href="impala_show.xml#show_tables"/>, <xref 
href="impala_show.xml#show_create_table"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_disable_codegen.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_disable_codegen.xml 
b/docs/topics/impala_disable_codegen.xml
new file mode 100644
index 0000000..844d49d
--- /dev/null
+++ b/docs/topics/impala_disable_codegen.xml
@@ -0,0 +1,36 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="disable_codegen">
+
+  <title>DISABLE_CODEGEN Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Troubleshooting"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">DISABLE_CODEGEN query option</indexterm>
+      This is a debug option, intended for diagnosing and working around 
issues that cause crashes. If a query
+      fails with an <q>illegal instruction</q> or other hardware-specific 
message, try setting
+      <codeph>DISABLE_CODEGEN=true</codeph> and running the query again. If 
the query succeeds only when the
+      <codeph>DISABLE_CODEGEN</codeph> option is turned on, submit the problem 
to Cloudera support and include that
+      detail in the problem report. Do not otherwise run with this setting 
turned on, because it results in lower
+      overall performance.
+    </p>
+
+    <p>
+      Because the code generation phase adds a small amount of overhead for 
each query, you might turn on the
+      <codeph>DISABLE_CODEGEN</codeph> option to achieve maximum throughput 
when running many short-lived queries
+      against small tables.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_disable_unsafe_spills.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_disable_unsafe_spills.xml 
b/docs/topics/impala_disable_unsafe_spills.xml
new file mode 100644
index 0000000..f251d65
--- /dev/null
+++ b/docs/topics/impala_disable_unsafe_spills.xml
@@ -0,0 +1,48 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="disable_unsafe_spills">
+
+  <title>DISABLE_UNSAFE_SPILLS Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Scalability"/>
+      <data name="Category" value="Memory"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">DISABLE_UNSAFE_SPILLS query 
option</indexterm>
+      Enable this option if you prefer to have queries fail when they exceed 
the Impala memory limit, rather than
+      write temporary data to disk.
+    </p>
+
+    <p>
+      Queries that <q>spill</q> to disk typically complete successfully, when 
in earlier Impala releases they would have failed.
+      However, queries with exorbitant memory requirements due to missing 
statistics or inefficient join clauses could
+      become so slow as a result that you would rather have them cancelled 
automatically and reduce the memory
+      usage through standard Impala tuning techniques.
+    </p>
+
+    <p>
+      This option prevents only <q>unsafe</q> spill operations, meaning that 
one or more tables are missing
+      statistics or the query does not include a hint to set the most 
efficient mechanism for a join or
+      <codeph>INSERT ... SELECT</codeph> into a partitioned table. These are 
the tables most likely to result in
+      suboptimal execution plans that could cause unnecessary spilling. 
Therefore, leaving this option enabled is a
+      good way to find tables on which to run the <codeph>COMPUTE 
STATS</codeph> statement.
+    </p>
+
+    <p>
+      See <xref href="impala_scalability.xml#spill_to_disk"/> for information 
about the <q>spill to disk</q>
+      feature for queries processing large result sets with joins, 
<codeph>ORDER BY</codeph>, <codeph>GROUP
+      BY</codeph>, <codeph>DISTINCT</codeph>, aggregation functions, or 
analytic functions.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_distinct.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_distinct.xml b/docs/topics/impala_distinct.xml
new file mode 100644
index 0000000..d49e400
--- /dev/null
+++ b/docs/topics/impala_distinct.xml
@@ -0,0 +1,59 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="distinct">
+
+  <title>DISTINCT Operator</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Aggregate Functions"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">DISTINCT operator</indexterm>
+      The <codeph>DISTINCT</codeph> operator in a <codeph>SELECT</codeph> 
statement filters the result set to
+      remove duplicates:
+    </p>
+
+<codeblock>-- Returns the unique values from one column.
+-- NULL is included in the set of values if any rows have a NULL in this 
column.
+select distinct c_birth_country from customer;
+-- Returns the unique combinations of values from multiple columns.
+select distinct c_salutation, c_last_name from customer;</codeblock>
+
+    <p>
+      You can use <codeph>DISTINCT</codeph> in combination with an aggregation 
function, typically
+      <codeph>COUNT()</codeph>, to find how many different values a column 
contains:
+    </p>
+
+<codeblock>-- Counts the unique values from one column.
+-- NULL is not included as a distinct value in the count.
+select count(distinct c_birth_country) from customer;
+-- Counts the unique combinations of values from multiple columns.
+select count(distinct c_salutation, c_last_name) from customer;</codeblock>
+
+    <p>
+      One construct that Impala SQL does <i>not</i> support is using 
<codeph>DISTINCT</codeph> in more than one
+      aggregation function in the same query. For example, you could not have 
a single query with both
+      <codeph>COUNT(DISTINCT c_first_name)</codeph> and <codeph>COUNT(DISTINCT 
c_last_name)</codeph> in the
+      <codeph>SELECT</codeph> list.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/zero_length_strings"/>
+
+    <note conref="../shared/impala_common.xml#common/multiple_count_distinct"/>
+
+    <note>
+      <p>
+        In contrast with some database systems that always return 
<codeph>DISTINCT</codeph> values in sorted order,
+        Impala does not do any ordering of <codeph>DISTINCT</codeph> values. 
Always include an <codeph>ORDER
+        BY</codeph> clause if you need the values in alphabetical or numeric 
sorted order.
+      </p>
+    </note>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_dml.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_dml.xml b/docs/topics/impala_dml.xml
new file mode 100644
index 0000000..66d4022
--- /dev/null
+++ b/docs/topics/impala_dml.xml
@@ -0,0 +1,85 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="dml">
+
+  <title>DML Statements</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DML"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      DML refers to <q>Data Manipulation Language</q>, a subset of SQL 
statements that modify the data stored in
+      tables. Because Impala focuses on query performance and leverages the 
append-only nature of HDFS storage,
+      currently Impala only supports a small set of DML statements:
+    </p>
+
+    <ul>
+      <li audience="impala_next">
+        <xref href="impala_delete.xml#delete"/>; works for Kudu tables only
+      </li>
+
+      <li>
+        <xref href="impala_insert.xml#insert"/>
+      </li>
+
+      <li>
+        <xref href="impala_load_data.xml#load_data"/>
+      </li>
+
+      <li audience="impala_next">
+        <xref href="impala_update.xml#update"/>; works for Kudu tables only
+      </li>
+    </ul>
+
+    <p>
+      <codeph>INSERT</codeph> in Impala is primarily optimized for inserting 
large volumes of data in a single
+      statement, to make effective use of the multi-megabyte HDFS blocks. This 
is the way in Impala to create new
+      data files. If you intend to insert one or a few rows at a time, such as 
using the <codeph>INSERT ...
+      VALUES</codeph> syntax, that technique is much more efficient for Impala 
tables stored in HBase. See
+      <xref href="impala_hbase.xml#impala_hbase"/> for details.
+    </p>
+
+    <p>
+      <codeph>LOAD DATA</codeph> moves existing data files into the directory 
for an Impala table, making them
+      immediately available for Impala queries. This is one way in Impala to 
work with data files produced by other
+      Hadoop components. (<codeph>CREATE EXTERNAL TABLE</codeph> is the other 
alternative; with external tables,
+      you can query existing data files, while the files remain in their 
original location.)
+    </p>
+
+    <p>
+      To simulate the effects of an <codeph>UPDATE</codeph> or 
<codeph>DELETE</codeph> statement in other database
+      systems, typically you use <codeph>INSERT</codeph> or <codeph>CREATE 
TABLE AS SELECT</codeph> to copy data
+      from one table to another, filtering out or changing the appropriate 
rows during the copy operation.
+    </p>
+
+    <p>
+      Although Impala currently does not have an <codeph>UPDATE</codeph> 
statement, you can achieve a similar
+      result by using Impala tables stored in HBase. When you insert a row 
into an HBase table, and the table
+      already contains a row with the same value for the key column, the older 
row is hidden, effectively the same
+      as a single-row <codeph>UPDATE</codeph>.
+    </p>
+
+    <p rev="2.2.0">
+      Currently, Impala cannot perform DML operations for tables or partitions 
stored in the Amazon S3 filesystem.
+      See <xref href="impala_s3.xml#s3"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      The other major classifications of SQL statements are data definition 
language (see
+      <xref href="impala_ddl.xml#ddl"/>) and queries (see <xref 
href="impala_select.xml#select"/>).
+    </p>
+  </conbody>
+</concept>

[13/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

Reply via email to