[04/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

jrussell Tue, 26 Jul 2016 16:05:48 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_string_functions.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_string_functions.xml 
b/docs/topics/impala_string_functions.xml
new file mode 100644
index 0000000..a051ed5
--- /dev/null
+++ b/docs/topics/impala_string_functions.xml
@@ -0,0 +1,719 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="string_functions">
+
+  <title>Impala String Functions</title>
+  <titlealts><navtitle>String Functions</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.0.0">
+      String functions are classified as those primarily accepting or 
returning <codeph>STRING</codeph>,
+      <codeph>VARCHAR</codeph>, or <codeph>CHAR</codeph> data types, for 
example to measure the length of a string
+      or concatenate two strings together.
+      <ul>
+        <li>
+          All the functions that accept <codeph>STRING</codeph> arguments also 
accept the <codeph>VARCHAR</codeph>
+          and <codeph>CHAR</codeph> types introduced in Impala 2.0.
+        </li>
+
+        <li>
+          Whenever <codeph>VARCHAR</codeph> or <codeph>CHAR</codeph> values 
are passed to a function that returns a
+          string value, the return type is normalized to 
<codeph>STRING</codeph>. For example, a call to
+          <codeph>concat()</codeph> with a mix of <codeph>STRING</codeph>, 
<codeph>VARCHAR</codeph>, and
+          <codeph>CHAR</codeph> arguments produces a <codeph>STRING</codeph> 
result.
+        </li>
+      </ul>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      The string functions operate mainly on these data types: <xref 
href="impala_string.xml#string"/>,
+      <xref href="impala_varchar.xml#varchar"/>, and <xref 
href="impala_char.xml#char"/>.
+    </p>
+
+    <p>
+      <b>Function reference:</b>
+    </p>
+
+    <p>
+      Impala supports the following string functions:
+    </p>
+
+    <dl>
+      <dlentry id="ascii">
+
+        <dt>
+          <codeph>ascii(string str)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">ascii() function</indexterm>
+          <b>Purpose:</b> Returns the numeric ASCII code of the first 
character of the argument.
+          <p>
+            <b>Return type:</b> <codeph>int</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="2.3.0" id="btrim">
+
+        <dt>
+          <codeph>btrim(string a)</codeph>,
+          <codeph>btrim(string a, string chars_to_trim)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">btrim() function</indexterm>
+          <b>Purpose:</b> Removes all instances of one or more characters
+          from the start and end of a <codeph>STRING</codeph> value.
+          By default, removes only spaces.
+          If a non-<codeph>NULL</codeph> optional second argument is 
specified, the function removes all
+          occurrences of characters in that second argument from the beginning 
and
+          end of the string.
+          <p><b>Return type:</b> <codeph>string</codeph></p>
+          <p conref="../shared/impala_common.xml#common/added_in_230"/>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <p>
+            The following examples show the default <codeph>btrim()</codeph> 
behavior,
+            and what changes when you specify the optional second argument.
+            All the examples bracket the output value with <codeph>[ ]</codeph>
+            so that you can see any leading or trailing spaces in the 
<codeph>btrim()</codeph> result.
+            By default, the function removes and number of both leading and 
trailing spaces.
+            When the second argument is specified, any number of occurrences 
of any
+            character in the second argument are removed from the start and 
end of the
+            input string; in this case, spaces are not removed (unless they 
are part of the second
+            argument) and any instances of the characters are not removed if 
they do not come
+            right at the beginning or end of the string.
+          </p>
+<codeblock>-- Remove multiple spaces before and one space after.
+select concat('[',btrim('    hello '),']');
++---------------------------------------+
+| concat('[', btrim('    hello '), ']') |
++---------------------------------------+
+| [hello]                               |
++---------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
+select concat('[',btrim('xy    hello zyzzxx','xyz'),']');
++------------------------------------------------------+
+| concat('[', btrim('xy    hello zyzzxx', 'xyz'), ']') |
++------------------------------------------------------+
+| [    hello ]                                         |
++------------------------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end.
+-- Leave x, y, z alone in the middle of the string.
+select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
++----------------------------------------------------+
+| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
++----------------------------------------------------+
+| [helxyzlo]                                         |
++----------------------------------------------------+
+</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.3.0" id="char_length">
+
+        <dt>
+          <codeph>char_length(string a), <ph rev="1.3.0" 
id="character_length">character_length(string a)</ph></codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">char_length() function</indexterm>
+          <indexterm audience="Cloudera">character_length() 
function</indexterm>
+          <b>Purpose:</b> Returns the length in characters of the argument 
string. Aliases for the
+          <codeph>length()</codeph> function.
+          <p>
+            <b>Return type:</b> <codeph>int</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="2.3.0" id="chr">
+
+        <dt>
+          <codeph>chr(int character_code)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">chr() function</indexterm>
+          <b>Purpose:</b> Returns a character specified by a decimal code 
point value.
+          The interpretation and display of the resulting character depends on 
your system locale.
+          Because consistent processing of Impala string values is only 
guaranteed 
+          for values within the ASCII range, only use this function for values
+          corresponding to ASCII characters.
+          In particular, parameter values greater than 255 return an empty 
string.
+          <p><b>Return type:</b> <codeph>string</codeph></p>
+          <p>
+            <b>Usage notes:</b> Can be used as the inverse of the 
<codeph>ascii()</codeph> function, which
+            converts a character to its numeric ASCII code.
+          </p>
+          <p conref="../shared/impala_common.xml#common/added_in_230"/>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+<codeblock>SELECT chr(65);
++---------+
+| chr(65) |
++---------+
+| A       |
++---------+
+
+SELECT chr(97);
++---------+
+| chr(97) |
++---------+
+| a       |
++---------+
+</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="concat">
+
+        <dt>
+          <codeph>concat(string a, string b...)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">concat() function</indexterm>
+          <b>Purpose:</b> Returns a single string representing all the 
argument values joined together.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+          <p conref="../shared/impala_common.xml#common/concat_blurb"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="concat_ws">
+
+        <dt>
+          <codeph>concat_ws(string sep, string a, string b...)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">concat_ws() function</indexterm>
+          <b>Purpose:</b> Returns a single string representing the second and 
following argument values joined
+          together, delimited by a specified separator.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+          <p conref="../shared/impala_common.xml#common/concat_blurb"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="find_in_set">
+
+        <dt>
+          <codeph>find_in_set(string str, string strList)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">find_in_set() function</indexterm>
+          <b>Purpose:</b> Returns the position (starting from 1) of the first 
occurrence of a specified string
+          within a comma-separated string. Returns <codeph>NULL</codeph> if 
either argument is
+          <codeph>NULL</codeph>, 0 if the search string is not found, or 0 if 
the search string contains a comma.
+          <p>
+            <b>Return type:</b> <codeph>int</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.2" id="group_concat">
+
+        <dt>
+          <codeph>group_concat(string s [, string sep])</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">group_concat() function</indexterm>
+          <b>Purpose:</b> Returns a single string representing the argument 
value concatenated together for each
+          row of the result set. If the optional separator string is 
specified, the separator is added between each
+          pair of concatenated values.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+          <p conref="../shared/impala_common.xml#common/concat_blurb"/>
+          <p>
+            By default, returns a single string covering the whole result set. 
To include other columns or values
+            in the result set, or to produce multiple concatenated strings for 
subsets of rows, include a
+            <codeph>GROUP BY</codeph> clause in the query.
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.2" id="initcap">
+
+        <dt>
+          <codeph>initcap(string str)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">initcap() function</indexterm>
+          <b>Purpose:</b> Returns the input string with the first letter 
capitalized.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="instr">
+
+        <dt>
+          <codeph>instr(string str, string substr)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">instr() function</indexterm>
+          <b>Purpose:</b> Returns the position (starting from 1) of the first 
occurrence of a substring within a
+          longer string.
+          <p>
+            <b>Return type:</b> <codeph>int</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="length">
+
+        <dt>
+          <codeph>length(string a)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">length() function</indexterm>
+          <b>Purpose:</b> Returns the length in characters of the argument 
string.
+          <p>
+            <b>Return type:</b> <codeph>int</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="locate">
+
+        <dt>
+          <codeph>locate(string substr, string str[, int pos])</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">locate() function</indexterm>
+          <b>Purpose:</b> Returns the position (starting from 1) of the first 
occurrence of a substring within a
+          longer string, optionally after a particular position.
+          <p>
+            <b>Return type:</b> <codeph>int</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="lower">
+
+        <dt>
+          <codeph>lower(string a), <ph id="lcase">lcase(string a)</ph> 
</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">lower() function</indexterm>
+          <b>Purpose:</b> Returns the argument string converted to 
all-lowercase.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="lpad">
+
+        <dt>
+          <codeph>lpad(string str, int len, string pad)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">lpad() function</indexterm>
+          <b>Purpose:</b> Returns a string of a specified length, based on the 
first argument string. If the
+          specified string is too short, it is padded on the left with a 
repeating sequence of the characters from
+          the pad string. If the specified string is too long, it is truncated 
on the right.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="ltrim">
+
+        <dt>
+          <codeph>ltrim(string a)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">ltrim() function</indexterm>
+          <b>Purpose:</b> Returns the argument string with any leading spaces 
removed from the left side.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="parse_url">
+
+        <dt>
+          <codeph>parse_url(string urlString, string partToExtract [, string 
keyToExtract])</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">parse_url() function</indexterm>
+          <b>Purpose:</b> Returns the portion of a URL corresponding to a 
specified part. The part argument can be
+          <codeph>'PROTOCOL'</codeph>, <codeph>'HOST'</codeph>, 
<codeph>'PATH'</codeph>, <codeph>'REF'</codeph>,
+          <codeph>'AUTHORITY'</codeph>, <codeph>'FILE'</codeph>, 
<codeph>'USERINFO'</codeph>, or
+          <codeph>'QUERY'</codeph>. Uppercase is required for these literal 
values. When requesting the
+          <codeph>QUERY</codeph> portion of the URL, you can optionally 
specify a key to retrieve just the
+          associated value from the key-value pairs in the query string.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+          <p>
+            <b>Usage notes:</b> This function is important for the traditional 
Hadoop use case of interpreting web
+            logs. For example, if the web traffic data features raw URLs not 
divided into separate table columns,
+            you can count visitors to a particular page by extracting the 
<codeph>'PATH'</codeph> or
+            <codeph>'FILE'</codeph> field, or analyze search terms by 
extracting the corresponding key from the
+            <codeph>'QUERY'</codeph> field.
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="regexp_extract">
+
+        <dt>
+          <codeph>regexp_extract(string subject, string pattern, int 
index)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">regexp_extract() function</indexterm>
+          <b>Purpose:</b> Returns the specified () group from a string based 
on a regular expression pattern. Group
+          0 refers to the entire extracted string, while group 1, 2, and so on 
refers to the first, second, and so
+          on <codeph>(...)</codeph> portion.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+          <p conref="../shared/impala_common.xml#common/regexp_re2"/>
+          <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
+          <p conref="../shared/impala_common.xml#common/regexp_escapes"/>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <p>
+            This example shows how group 0 matches the full pattern string, 
including the portion outside any
+            <codeph>()</codeph> group:
+          </p>
+<codeblock>[localhost:21000] &gt; select 
regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
++------------------------------------------------------+
+| abcdef123ghi456                                      |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</codeblock>
+          <p>
+            This example shows how group 1 matches just the contents inside 
the first <codeph>()</codeph> group in
+            the pattern string:
+          </p>
+<codeblock>[localhost:21000] &gt; select 
regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
++------------------------------------------------------+
+| 456                                                  |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</codeblock>
+          <p rev="2.0.0">
+            Unlike in earlier Impala releases, the regular expression library 
used in Impala 2.0 and later supports
+            the <codeph>.*?</codeph> idiom for non-greedy matches. This 
example shows how a pattern string starting
+            with <codeph>.*?</codeph> matches the shortest possible portion of 
the source string, returning the
+            rightmost set of lowercase letters. A pattern string both starting 
and ending with <codeph>.*?</codeph>
+            finds two potential matches of equal length, and returns the first 
one found (the leftmost set of
+            lowercase letters).
+          </p>
+<codeblock>[localhost:21000] &gt; select 
regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
++--------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
++--------------------------------------------------------+
+| def                                                    |
++--------------------------------------------------------+
+[localhost:21000] &gt; select 
regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1);
++-----------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) |
++-----------------------------------------------------------+
+| bcd                                                       |
++-----------------------------------------------------------+
+</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="regexp_replace">
+
+        <dt>
+          <codeph>regexp_replace(string initial, string pattern, string 
replacement)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">regexp_replace() function</indexterm>
+          <b>Purpose:</b> Returns the initial argument with the regular 
expression pattern replaced by the final
+          argument string.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+          <p conref="../shared/impala_common.xml#common/regexp_re2"/>
+          <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
+          <p conref="../shared/impala_common.xml#common/regexp_escapes"/>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <p>
+            These examples show how you can replace parts of a string matching 
a pattern with replacement text,
+            which can include backreferences to any <codeph>()</codeph> groups 
in the pattern string. The
+            backreference numbers start at 1, and any <codeph>\</codeph> 
characters must be escaped as
+            <codeph>\\</codeph>.
+          </p>
+          <p>
+            Replace a character pattern with new text:
+          </p>
+<codeblock>[localhost:21000] &gt; select 
regexp_replace('aaabbbaaa','b+','xyz');
++------------------------------------------+
+| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
++------------------------------------------+
+| aaaxyzaaa                                |
++------------------------------------------+
+Returned 1 row(s) in 0.11s</codeblock>
+          <p>
+            Replace a character pattern with substitution text that includes 
the original matching text:
+          </p>
+<codeblock>[localhost:21000] &gt; select 
regexp_replace('aaabbbaaa','(b+)','&lt;\\1&gt;');
++----------------------------------------------+
+| regexp_replace('aaabbbaaa', '(b+)', '&lt;\\1&gt;') |
++----------------------------------------------+
+| aaa&lt;bbb&gt;aaa                                  |
++----------------------------------------------+
+Returned 1 row(s) in 0.11s</codeblock>
+          <p>
+            Remove all characters that are not digits:
+          </p>
+<codeblock>[localhost:21000] &gt; select 
regexp_replace('123-456-789','[^[:digit:]]','');
++---------------------------------------------------+
+| regexp_replace('123-456-789', '[^[:digit:]]', '') |
++---------------------------------------------------+
+| 123456789                                         |
++---------------------------------------------------+
+Returned 1 row(s) in 0.12s</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="repeat">
+
+        <dt>
+          <codeph>repeat(string str, int n)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">repeat() function</indexterm>
+          <b>Purpose:</b> Returns the argument string repeated a specified 
number of times.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="reverse">
+
+        <dt>
+          <codeph>reverse(string a)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">reverse() function</indexterm>
+          <b>Purpose:</b> Returns the argument string with characters in 
reversed order.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="rpad">
+
+        <dt>
+          <codeph>rpad(string str, int len, string pad)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">rpad() function</indexterm>
+          <b>Purpose:</b> Returns a string of a specified length, based on the 
first argument string. If the
+          specified string is too short, it is padded on the right with a 
repeating sequence of the characters from
+          the pad string. If the specified string is too long, it is truncated 
on the right.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="rtrim">
+
+        <dt>
+          <codeph>rtrim(string a)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">rtrim() function</indexterm>
+          <b>Purpose:</b> Returns the argument string with any trailing spaces 
removed from the right side.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="space">
+
+        <dt>
+          <codeph>space(int n)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">space() function</indexterm>
+          <b>Purpose:</b> Returns a concatenated string of the specified 
number of spaces. Shorthand for
+          <codeph>repeat(' ',<varname>n</varname>)</codeph>.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="strleft">
+
+        <dt>
+          <codeph>strleft(string a, int num_chars)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">strleft() function</indexterm>
+          <b>Purpose:</b> Returns the leftmost characters of the string. 
Shorthand for a call to
+          <codeph>substr()</codeph> with 2 arguments.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="strright">
+
+        <dt>
+          <codeph>strright(string a, int num_chars)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">strright() function</indexterm>
+          <b>Purpose:</b> Returns the rightmost characters of the string. 
Shorthand for a call to
+          <codeph>substr()</codeph> with 2 arguments.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="substr">
+
+        <dt>
+          <codeph>substr(string a, int start [, int len]), <ph 
id="substring">substring(string a, int start [, int
+          len])</ph></codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">substr() function</indexterm>
+          <b>Purpose:</b> Returns the portion of the string starting at a 
specified point, optionally with a
+          specified maximum length. The characters in the string are indexed 
starting at 1.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="translate">
+
+        <dt>
+          <codeph>translate(string input, string from, string to)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">translate() function</indexterm>
+          <b>Purpose:</b> Returns the input string with a set of characters 
replaced by another set of characters.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="trim">
+
+        <dt>
+          <codeph>trim(string a)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">trim() function</indexterm>
+          <b>Purpose:</b> Returns the input string with both leading and 
trailing spaces removed. The same as
+          passing the string through both <codeph>ltrim()</codeph> and 
<codeph>rtrim()</codeph>.
+          <p>
+            <b>Usage notes:</b> Often used during data cleansing operations 
during the ETL cycle, if input values might still have surrounding spaces.
+            For a more general-purpose function that can remove other leading 
and trailing characters besides spaces, see <codeph>btrim()</codeph>.
+          </p>
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="upper">
+
+        <dt>
+          <codeph>upper(string a), <ph id="ucase">ucase(string a)</ph></codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">upper() function</indexterm>
+          <indexterm audience="Cloudera">ucase() function</indexterm>
+          <b>Purpose:</b> Returns the argument string converted to 
all-uppercase.
+          <p>
+            <b>Return type:</b> <codeph>string</codeph>
+          </p>
+        </dd>
+
+      </dlentry>
+    </dl>
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_struct.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_struct.xml b/docs/topics/impala_struct.xml
new file mode 100644
index 0000000..1e440fc
--- /dev/null
+++ b/docs/topics/impala_struct.xml
@@ -0,0 +1,406 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+
+  <concept id="struct">
+
+    <title>STRUCT Complex Type (CDH 5.5 or higher only)</title>
+
+    <prolog>
+      <metadata>
+        <data name="Category" value="Impala"/>
+        <data name="Category" value="Impala Data Types"/>
+      </metadata>
+    </prolog>
+
+    <conbody>
+
+      <p>
+        A complex data type, representing multiple fields of a single item.
+        Frequently used as the element type of an <codeph>ARRAY</codeph>
+        or the <codeph>VALUE</codeph> part of a <codeph>MAP</codeph>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>column_name</varname> STRUCT &lt; <varname>name</varname> 
: <varname>type</varname> [COMMENT '<varname>comment_string</varname>'], ... 
&gt;
+
+type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
+</codeblock>
+
+      <p>
+        The names and number of fields within the <codeph>STRUCT</codeph> are 
fixed. Each field can be a different type.
+        A field within a <codeph>STRUCT</codeph> can also be another 
<codeph>STRUCT</codeph>, or an <codeph>ARRAY</codeph>
+        or a <codeph>MAP</codeph>, allowing you to create nested data 
structures with a maximum nesting depth of 100.
+      </p>
+
+      <p>
+        A <codeph>STRUCT</codeph> can be the top-level type for a column, or 
can itself be an item within an <codeph>ARRAY</codeph>
+        or the value part of the key-value pair in a <codeph>MAP</codeph>.
+      </p>
+
+      <p>
+        When a <codeph>STRUCT</codeph> is used as an <codeph>ARRAY</codeph> 
element or a <codeph>MAP</codeph> value,
+        you use a join clause to bring the <codeph>ARRAY</codeph> or 
<codeph>MAP</codeph> elements into the result set, and then refer
+        to 
<codeph><varname>array_name</varname>.ITEM.<varname>field</varname></codeph> or
+        
<codeph><varname>map_name</varname>.VALUE.<varname>field</varname></codeph>.
+        In the case of a <codeph>STRUCT</codeph> directly inside an 
<codeph>ARRAY</codeph> or <codeph>MAP</codeph>,
+        you can omit the <codeph>.ITEM</codeph> and <codeph>.VALUE</codeph> 
pseudocolumns and refer directly to
+        
<codeph><varname>array_name</varname>.<varname>field</varname></codeph> or
+        <codeph><varname>map_name</varname>.<varname>field</varname></codeph>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_combo"/>
+
+      <p>
+        A <codeph>STRUCT</codeph> is similar conceptually to a table row: it 
contains a fixed number of named fields,
+        each with a predefined type. To combine two related tables, while 
using complex types to
+        minimize repetition, the typical way to represent that data is as an 
<codeph>ARRAY</codeph> of <codeph>STRUCT</codeph> elements.
+      </p>
+
+      <p>
+        Because a <codeph>STRUCT</codeph> has a fixed number of named fields, 
it typically does not make sense
+        to have a <codeph>STRUCT</codeph> as the type of a table column. In 
such a case, you could just make each field of the
+        <codeph>STRUCT</codeph> into a separate column of the table. The 
<codeph>STRUCT</codeph> type is most
+        useful as an item of an <codeph>ARRAY</codeph> or the value part of 
the key-value pair in a <codeph>MAP</codeph>.
+        A nested type column with a <codeph>STRUCT</codeph> at the lowest 
level lets you associate a variable
+        number of row-like objects with each row of the table.
+      </p>
+
+      <p>
+        The <codeph>STRUCT</codeph> type is straightforward to reference 
within a query. You do not need to
+        include the <codeph>STRUCT</codeph> column in a join clause or give it 
a table alias, as is
+        required for the <codeph>ARRAY</codeph> and <codeph>MAP</codeph> 
types. You refer to the individual
+        fields using dot notation, such as 
<codeph><varname>struct_column_name</varname>.<varname>field_name</varname></codeph>,
+        without any pseudocolumn such as <codeph>ITEM</codeph> or 
<codeph>VALUE</codeph>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_describe"/>
+
+      <p conref="../shared/impala_common.xml#common/internals_blurb"/>
+
+      <p>
+        Within the Parquet data file, the values for each 
<codeph>STRUCT</codeph> field are stored adjacent to each other,
+        so that they can be encoded and compressed using all the Parquet 
techniques for storing sets of similar or
+        repeated values. The adjacency applies even when the 
<codeph>STRUCT</codeph> values are part of an
+        <codeph>ARRAY</codeph> or <codeph>MAP</codeph>. During a query, Impala 
avoids unnecessary I/O by reading only the portions
+        of the Parquet data file containing the requested 
<codeph>STRUCT</codeph> fields.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/added_in_230"/>
+
+      <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+      <ul 
conref="../shared/impala_common.xml#common/complex_types_restrictions">
+        <li/>
+      </ul>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <note 
conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>
+
+      <p>
+        The following example shows a table with various kinds of 
<codeph>STRUCT</codeph> columns,
+        both at the top level and nested within other complex types.
+        Practice the <codeph>CREATE TABLE</codeph> and query notation for 
complex type columns
+        using empty tables, until you can visualize a complex data structure 
and construct corresponding SQL statements reliably.
+      </p>
+
+<codeblock><![CDATA[CREATE TABLE struct_demo
+(
+  id BIGINT,
+  name STRING,
+
+-- A STRUCT as a top-level column. Demonstrates how the table ID column
+-- and the ID field within the STRUCT can coexist without a name conflict.
+  employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,
+
+-- A STRUCT as the element type of an ARRAY.
+  places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING 
>>,
+
+-- A STRUCT as the value portion of the key-value pairs in a MAP.
+  memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: 
STRING >>,
+
+-- A STRUCT where one of the fields is another STRUCT.
+  current_address STRUCT < street_address: STRUCT <street_number: INT, 
street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING 
>
+)
+STORED AS PARQUET;
+]]>
+</codeblock>
+
+      <p>
+        The following example shows how to examine the structure of a table 
containing one or more
+        <codeph>STRUCT</codeph> columns by using the <codeph>DESCRIBE</codeph> 
statement. You can
+        visualize each <codeph>STRUCT</codeph> as its own table, with columns
+        named the same as each field of the <codeph>STRUCT</codeph>.
+        If the <codeph>STRUCT</codeph> is nested inside another complex type, 
such as <codeph>ARRAY</codeph>,
+        you can extend the qualified name passed to <codeph>DESCRIBE</codeph> 
until the output
+        shows just the <codeph>STRUCT</codeph> fields.
+      </p>
+
+<codeblock><![CDATA[DESCRIBE struct_demo;
++-------------------+--------------------------+
+| name              | type                     |
++-------------------+--------------------------+
+| id                | bigint                   |
+| name              | string                   |
+| employee_info     | struct<                  |
+|                   |   employer:string,       |
+|                   |   id:bigint,             |
+|                   |   address:string         |
+|                   | >                        |
+| places_lived      | array<struct<            |
+|                   |   street:string,         |
+|                   |   city:string,           |
+|                   |   country:string         |
+|                   | >>                       |
+| memorable_moments | map<string,struct<       |
+|                   |   year:int,              |
+|                   |   place:string,          |
+|                   |   details:string         |
+|                   | >>                       |
+| current_address   | struct<                  |
+|                   |   street_address:struct< |
+|                   |     street_number:int,   |
+|                   |     street_name:string,  |
+|                   |     street_type:string   |
+|                   |   >,                     |
+|                   |   country:string,        |
+|                   |   postal_code:string     |
+|                   | >                        |
++-------------------+--------------------------+
+
+DESCRIBE struct_demo.employee_info;
++----------+--------+
+| name     | type   |
++----------+--------+
+| employer | string |
+| id       | bigint |
+| address  | string |
++----------+--------+
+
+-- Because PLACES_LIVED is a STRUCT inside an ARRAY, the
+-- initial DESCRIBE shows the structure of the ARRAY.
+DESCRIBE struct_demo.places_lived;
++------+------------------+
+| name | type             |
++------+------------------+
+| item | struct<          |
+|      |   street:string, |
+|      |   city:string,   |
+|      |   country:string |
+|      | >                |
+| pos  | bigint           |
++------+------------------+
+
+-- Ask for the details of the ITEM field of the ARRAY to see
+-- just the layout of the STRUCT.
+DESCRIBE struct_demo.places_lived.item;
++---------+--------+
+| name    | type   |
++---------+--------+
+| street  | string |
+| city    | string |
+| country | string |
++---------+--------+
+
+-- Likewise, MEMORABLE_MOMENTS has a STRUCT inside a MAP,
+-- which requires an extra level of qualified name to see
+-- just the STRUCT part.
+DESCRIBE struct_demo.memorable_moments;
++-------+------------------+
+| name  | type             |
++-------+------------------+
+| key   | string           |
+| value | struct<          |
+|       |   year:int,      |
+|       |   place:string,  |
+|       |   details:string |
+|       | >                |
++-------+------------------+
+
+-- For a MAP, ask to see the VALUE field to see the
+-- corresponding STRUCT fields in a table-like structure.
+DESCRIBE struct_demo.memorable_moments.value;
++---------+--------+
+| name    | type   |
++---------+--------+
+| year    | int    |
+| place   | string |
+| details | string |
++---------+--------+
+
+-- For a STRUCT inside a STRUCT, we can see the fields of the
+-- outer STRUCT...
+DESCRIBE struct_demo.current_address;
++----------------+-----------------------+
+| name           | type                  |
++----------------+-----------------------+
+| street_address | struct<               |
+|                |   street_number:int,  |
+|                |   street_name:string, |
+|                |   street_type:string  |
+|                | >                     |
+| country        | string                |
+| postal_code    | string                |
++----------------+-----------------------+
+
+-- ...and then use a further qualified name to see just the
+-- fields of the inner STRUCT.
+DESCRIBE struct_demo.current_address.street_address;
++---------------+--------+
+| name          | type   |
++---------------+--------+
+| street_number | int    |
+| street_name   | string |
+| street_type   | string |
++---------------+--------+
+]]>
+</codeblock>
+
+      <p>
+        The following example shows how to examine the structure of a table 
containing one or more
+        <codeph>STRUCT</codeph> columns by using the <codeph>DESCRIBE</codeph> 
statement. You can
+        visualize each <codeph>STRUCT</codeph> as its own table, with columns
+        named the same as each field of the <codeph>STRUCT</codeph>.
+        If the <codeph>STRUCT</codeph> is nested inside another complex type, 
such as <codeph>ARRAY</codeph>,
+        you can extend the qualified name passed to <codeph>DESCRIBE</codeph> 
until the output
+        shows just the <codeph>STRUCT</codeph> fields.
+      </p>
+
+<!-- To do: See why the most verbose query form gives an error. -->
+<codeblock><![CDATA[DESCRIBE struct_demo;
++-------------------+--------------------------+---------+
+| name              | type                     | comment |
++-------------------+--------------------------+---------+
+| id                | bigint                   |         |
+| name              | string                   |         |
+| employee_info     | struct<                  |         |
+|                   |   employer:string,       |         |
+|                   |   id:bigint,             |         |
+|                   |   address:string         |         |
+|                   | >                        |         |
+| places_lived      | array<struct<            |         |
+|                   |   street:string,         |         |
+|                   |   city:string,           |         |
+|                   |   country:string         |         |
+|                   | >>                       |         |
+| memorable_moments | map<string,struct<       |         |
+|                   |   year:int,              |         |
+|                   |   place:string,          |         |
+|                   |   details:string         |         |
+|                   | >>                       |         |
+| current_address   | struct<                  |         |
+|                   |   street_address:struct< |         |
+|                   |     street_number:int,   |         |
+|                   |     street_name:string,  |         |
+|                   |     street_type:string   |         |
+|                   |   >,                     |         |
+|                   |   country:string,        |         |
+|                   |   postal_code:string     |         |
+|                   | >                        |         |
++-------------------+--------------------------+---------+
+
+SELECT id, employee_info.id FROM struct_demo;
+
+SELECT id, employee_info.id AS employee_id FROM struct_demo;
+
+SELECT id, employee_info.id AS employee_id, employee_info.employer
+  FROM struct_demo;
+
+SELECT id, name, street, city, country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, struct_demo.places_lived.pos, 
struct_demo.places_lived.street, struct_demo.places_lived.city, 
struct_demo.places_lived.country
+  FROM struct_demo, struct_demo.places_lived;
+ERROR: AnalysisException: Illegal column/field reference 
'struct_demo.places_lived.pos' with intermediate collection 'places_lived' of 
type 'ARRAY<STRUCT<street:STRING,city:STRING,country:STRING>>'
+
+SELECT id, name, pl.pos, pl.street, pl.city, pl.country
+  FROM struct_demo, struct_demo.places_lived AS pl;
+
+SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, 
places_lived.country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, pos, street, city, country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, struct_demo.memorable_moments.key,
+  struct_demo.memorable_moments.value.year,
+  struct_demo.memorable_moments.value.place,
+  struct_demo.memorable_moments.value.details
+FROM struct_demo, struct_demo.memorable_moments
+WHERE struct_demo.memorable_moments.key IN 
('Birthday','Anniversary','Graduation');
+ERROR: AnalysisException: Illegal column/field reference 
'struct_demo.memorable_moments.key' with intermediate collection 
'memorable_moments' of type 
'MAP<STRING,STRUCT<year:INT,place:STRING,details:STRING>>'
+
+SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details
+  FROM struct_demo, struct_demo.memorable_moments AS mm
+WHERE mm.key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, memorable_moments.key, memorable_moments.value.year,
+  memorable_moments.value.place, memorable_moments.value.details
+FROM struct_demo, struct_demo.memorable_moments
+WHERE key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, key, value.year, value.place, value.details
+  FROM struct_demo, struct_demo.memorable_moments
+WHERE key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, key, year, place, details
+  FROM struct_demo, struct_demo.memorable_moments
+WHERE key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name,
+  current_address.street_address.street_number,
+  current_address.street_address.street_name,
+  current_address.street_address.street_type,
+  current_address.country,
+  current_address.postal_code
+FROM struct_demo;
+]]>
+</codeblock>
+
+      <p>
+        For example, this table uses a struct that encodes several data values 
for each phone number associated
+        with a person. Each person can have a variable-length array of 
associated phone numbers, and queries can
+        refer to the category field to locate specific home, work, mobile, and 
so on kinds of phone numbers.
+      </p>
+
+<codeblock>CREATE TABLE contact_info_many_structs
+(
+  id BIGINT, name STRING,
+  phone_numbers ARRAY &lt; STRUCT &lt;category:STRING, country_code:STRING, 
area_code:SMALLINT, full_number:STRING, mobile:BOOLEAN, carrier:STRING &gt; &gt;
+) STORED AS PARQUET;
+</codeblock>
+
+      <p>
+        Because structs are naturally suited to composite values where the 
fields have different data types, you might use them
+        to decompose things such as addresses:
+      </p>
+
+<codeblock>CREATE TABLE contact_info_detailed_address
+(
+  id BIGINT, name STRING,
+  address STRUCT &lt; house_number:INT, street:STRING, street_type:STRING, 
apartment:STRING, city:STRING, region:STRING, country:STRING &gt;
+);
+</codeblock>
+
+      <p>
+        In a big data context, splitting out data fields such as the number 
part of the address and the street name
+        could let you do analysis on each field independently. For example, 
which streets have the largest number
+        range of addresses, what are the statistical properties of the street 
names, which areas have a higher
+        proportion of <q>Roads</q>, <q>Courts</q> or <q>Boulevards</q>, and so 
on.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_complex_types.xml#complex_types"/>,
+        <xref href="impala_array.xml#array"/>,
+        <!-- <xref href="impala_struct.xml#struct"/>, -->
+        <xref href="impala_map.xml#map"/>
+      </p>
+
+    </conbody>
+
+  </concept>
+

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_subqueries.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_subqueries.xml 
b/docs/topics/impala_subqueries.xml
new file mode 100644
index 0000000..ed99f3a
--- /dev/null
+++ b/docs/topics/impala_subqueries.xml
@@ -0,0 +1,318 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="subqueries">
+
+  <title>Subqueries in Impala SELECT Statements</title>
+  <titlealts><navtitle>Subqueries</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">subqueries</indexterm>
+      A <term>subquery</term> is a query that is nested within another query. 
Subqueries let queries on one table
+      dynamically adapt based on the contents of another table. This technique 
provides great flexibility and
+      expressive power for SQL queries.
+    </p>
+
+    <p>
+      A subquery can return a result set for use in the <codeph>FROM</codeph> 
or <codeph>WITH</codeph> clauses, or
+      with operators such as <codeph>IN</codeph> or <codeph>EXISTS</codeph>.
+    </p>
+
+    <p>
+      A <term>scalar subquery</term> produces a result set with a single row 
containing a single column, typically
+      produced by an aggregation function such as <codeph>MAX()</codeph> or 
<codeph>SUM()</codeph>. This single
+      result value can be substituted in scalar contexts such as arguments to 
comparison operators. If the result
+      set is empty, the value of the scalar subquery is <codeph>NULL</codeph>. 
For example, the following query
+      finds the maximum value of <codeph>T2.Y</codeph> and then substitutes 
that value into the
+      <codeph>WHERE</codeph> clause of the outer block that queries 
<codeph>T1</codeph>:
+    </p>
+
+<codeblock>SELECT x FROM t1 WHERE x &gt; (SELECT MAX(y) FROM t2);
+</codeblock>
+
+    <p>
+      <term>Uncorrelated subqueries</term> do not refer to any tables from the 
outer block of the query. The same
+      value or set of values produced by the subquery is used when evaluating 
each row from the outer query block.
+      In this example, the subquery returns an arbitrary number of values from 
<codeph>T2.Y</codeph>, and each
+      value of <codeph>T1.X</codeph> is tested for membership in that same set 
of values:
+    </p>
+
+<codeblock>SELECT x FROM t1 WHERE x IN (SELECT y FROM t2);
+</codeblock>
+
+    <p>
+      <term>Correlated subqueries</term> compare one or more values from the 
outer query block to values referenced
+      in the <codeph>WHERE</codeph> clause of the subquery. Each row evaluated 
by the outer <codeph>WHERE</codeph>
+      clause can be evaluated using a different set of values. These kinds of 
subqueries are restricted in the
+      kinds of comparisons they can do between columns of the inner and outer 
tables. (See the following
+      <b>Restrictions</b> item.)
+    </p>
+
+    <p>
+      For example, the following query finds all the employees with salaries 
that are higher than average for their
+      department. The subquery potentially computes a different 
<codeph>AVG()</codeph> value for each employee.
+    </p>
+
+<!-- TK: Construct an EMPLOYEES schema to try out examples like these. -->
+
+<codeblock>SELECT employee_name, employee_id FROM employees one WHERE
+  salary &gt; (SELECT avg(salary) FROM employees two WHERE one.dept_id = 
two.dept_id);
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      <b>Subquery in the <codeph>FROM</codeph> clause:</b>
+    </p>
+
+<codeblock>SELECT <varname>select_list</varname> FROM 
<varname>table_ref</varname> [, <varname>table_ref</varname> ...]
+
+<varname>table_ref</varname> ::= <varname>table_name</varname> | 
(<varname>select_statement</varname>)
+</codeblock>
+
+    <p>
+      <b>Subqueries in <codeph>WHERE</codeph> clause:</b>
+    </p>
+
+<codeblock>WHERE <varname>value</varname> 
<varname>comparison_operator</varname> 
(<varname>scalar_select_statement</varname>)
+WHERE <varname>value</varname> [NOT] IN (<varname>select_statement</varname>)
+WHERE [NOT] EXISTS (<varname>correlated_select_statement</varname>)
+WHERE NOT EXISTS (<varname>correlated_select_statement</varname>)
+</codeblock>
+
+    <p>
+      <codeph>comparison_operator</codeph> is a numeric comparison such as 
<codeph>=</codeph>,
+      <codeph>&lt;=</codeph>, <codeph>!=</codeph>, and so on, or a string 
comparison operator such as
+      <codeph>LIKE</codeph> or <codeph>REGEXP</codeph>.
+    </p>
+
+    <p rev="2.0.0">
+      Although you can use non-equality comparison operators such as 
<codeph>&lt;</codeph> or
+      <codeph>&gt;=</codeph>, the subquery must include at least one equality 
comparison between the columns of the
+      inner and outer query blocks.
+    </p>
+
+    <p>
+      All syntax is available for both correlated and uncorrelated queries, 
except that the <codeph>NOT
+      EXISTS</codeph> clause cannot be used with an uncorrelated subquery.
+    </p>
+
+    <p>
+      Impala subqueries can be nested arbitrarily deep.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/sql1999"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      This example illustrates how subqueries can be used in the 
<codeph>FROM</codeph> clause to organize the table
+      names, column names, and column values by producing intermediate result 
sets, especially for join queries.
+    </p>
+
+<codeblock>SELECT avg(t1.x), max(t2.y) FROM
+  (SELECT id, cast(a AS DECIMAL(10,5)) AS x FROM raw_data WHERE a BETWEEN 0 
AND 100) AS t1
+  JOIN
+  (SELECT id, length(s) AS y FROM raw_data WHERE s LIKE 'A%') AS t2;
+  USING (id);
+</codeblock>
+
+    <p rev="2.0.0">
+      These examples show how a query can test for the existence of values in 
a separate table using the
+      <codeph>EXISTS()</codeph> operator with a subquery.
+<!--
+Internally, these queries are processed in a way similar to join queries.
+Because the values from the second table are not part of the result set, the 
subquery
+is more efficient than the equivalent join query.
+-->
+    </p>
+
+    <p>
+      The following examples show how a value can be compared against a set of 
values returned by a subquery.
+    </p>
+
+<codeblock rev="2.0.0">SELECT count(x) FROM t1 WHERE EXISTS(SELECT 1 FROM t2 
WHERE t1.x = t2.y * 10);
+
+SELECT x FROM t1 WHERE x IN (SELECT y FROM t2 WHERE state = 'CA');
+</codeblock>
+
+    <p>
+      The following examples demonstrate scalar subqueries. When a subquery is 
known to return a single value, you
+      can substitute it where you would normally put a constant value.
+    </p>
+
+<codeblock>SELECT x FROM t1 WHERE y = (SELECT max(z) FROM t2);
+SELECT x FROM t1 WHERE y &gt; (SELECT count(z) FROM t2);
+</codeblock>
+
+<!-- <p 
conref="/Content/impala_common_xi44078.xml#common/partitioning_blurb"/> -->
+
+<!--
+<p conref="/Content/impala_common_xi44078.xml#common/hbase_blurb"/>
+<p>
+Currently, the <codeph>IN (<varname>subquery</varname>)</codeph> operator 
results in a full table scan
+of an HBase table, rather than being translated into a series of single-row 
lookups.
+Therefore, this is not an efficient construct to use with Impala queries for 
HBase tables.
+</p>
+-->
+
+<!--
+<p conref="/Content/impala_common_xi44078.xml#common/parquet_blurb"/>
+<p conref="/Content/impala_common_xi44078.xml#common/text_blurb"/>
+<p conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/>
+-->
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      If the same table is referenced in both the outer and inner query 
blocks, construct a table alias in the
+      outer query block and use a fully qualified name to distinguish the 
inner and outer table references:
+    </p>
+
+<!-- TK: verify the logic of this example. Probably have other similar ones 
that could be reused here. -->
+
+<codeblock>SELECT * FROM t1 one WHERE id IN (SELECT parent FROM t1 two WHERE 
t1.parent = t2.id);
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/internals_blurb"/>
+
+    <p>
+      Internally, subqueries involving <codeph>IN</codeph>, <codeph>NOT 
IN</codeph>, <codeph>EXISTS</codeph>, or
+      <codeph>NOT EXISTS</codeph> clauses are rewritten into join queries. 
Depending on the syntax, the subquery
+      might be rewritten to an outer join, semi join, cross join, or anti join.
+    </p>
+
+    <p>
+      A query is processed differently depending on whether the subquery calls 
any aggregation functions. There are
+      correlated and uncorrelated forms, with and without calls to aggregation 
functions. Each of these four
+      categories is rewritten differently.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/column_stats_blurb"/>
+
+    <p>
+      Because queries that include correlated and uncorrelated subqueries in 
the <codeph>WHERE</codeph> clause are
+      written into join queries, to achieve best performance, follow the same 
guidelines for running the
+      <codeph>COMPUTE STATS</codeph> statement as you do for tables involved 
in regular join queries. Run the
+      <codeph>COMPUTE STATS</codeph> statement for each associated tables 
after loading or substantially changing
+      the data in that table. See <xref 
href="impala_perf_stats.xml#perf_stats"/> for details.
+    </p>
+
+    <p>
+      <b>Added in:</b> Subqueries are substantially enhanced starting in 
Impala 2.0 for CDH 4, and CDH 5.2.0. Now,
+      they can be used in the <codeph>WHERE</codeph> clause, in combination 
with clauses such as
+      <codeph>EXISTS</codeph> and <codeph>IN</codeph>, rather than just in the 
<codeph>FROM</codeph> clause.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p>
+      The initial Impala support for nested subqueries addresses the most 
common use cases. Some restrictions
+      remain:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          Although you can use subqueries in a query involving 
<codeph>UNION</codeph> or <codeph>UNION ALL</codeph>
+          in Impala 2.1.0 and higher, currently you cannot construct a union 
of two subqueries (for example, in the
+          argument of an <codeph>IN</codeph> or <codeph>EXISTS</codeph> 
operator).
+        </p>
+      </li>
+
+      <li>
+        <p>
+          Subqueries returning scalar values cannot be used with the operators 
<codeph>ANY</codeph> or
+          <codeph>ALL</codeph>. (Impala does not currently have a 
<codeph>SOME</codeph> operator, but if it did,
+          the same restriction would apply.)
+        </p>
+      </li>
+
+      <li>
+        <p>
+          For the <codeph>EXISTS</codeph> and <codeph>NOT EXISTS</codeph> 
clauses, any subquery comparing values
+          from the outer query block to another table must use at least one 
equality comparison, not exclusively
+          other kinds of comparisons such as less than, greater than, 
<codeph>BETWEEN</codeph>, or
+          <codeph>!=</codeph>.
+        </p>
+      </li>
+
+      <li>
+<!-- TK: think this is no longer true. -->
+        <p>
+          Currently, a scalar subquery cannot be used as the first or second 
argument to the
+          <codeph>BETWEEN</codeph> operator.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          A subquery cannot be used inside an <codeph>OR</codeph> conjunction. 
Expressions inside a subquery, for
+          example in the <codeph>WHERE</codeph> clause, can use 
<codeph>OR</codeph> conjunctions; the restriction
+          only applies to parts of the query <q>above</q> the subquery.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          Scalar subqueries are only supported in numeric contexts. You cannot 
use a scalar subquery as an argument
+          to the <codeph>LIKE</codeph>, <codeph>REGEXP</codeph>, or 
<codeph>RLIKE</codeph> operators, or compare it
+          to a value of a non-numeric type such as <codeph>TIMESTAMP</codeph> 
or <codeph>BOOLEAN</codeph>.
+        </p>
+      </li>
+
+      <li>
+        <p>
+<!-- A subquery cannot be used to generate a scalar value for a function call. 
-->
+          You cannot use subqueries with the <codeph>CASE</codeph> function to 
generate the comparison value, the
+          values to be compared against, or the return value.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          A subquery is not allowed in the filter condition for the 
<codeph>HAVING</codeph> clause. (Strictly
+          speaking, a subquery cannot appear anywhere outside the 
<codeph>WITH</codeph>, <codeph>FROM</codeph>, and
+          <codeph>WHERE</codeph> clauses.)
+        </p>
+      </li>
+
+      <li>
+        <p>
+          You must use a fully qualified name
+          
(<codeph><varname>table_name</varname>.<varname>column_name</varname></codeph> 
or
+          
<codeph><varname>database_name</varname>.<varname>table_name</varname>.<varname>column_name</varname></codeph>)
+          when referring to any column from the outer query block within a 
subquery.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      For the complex types (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, 
and <codeph>MAP</codeph>)
+      available in CDH 5.5 / Impala 2.3 and higher, the join queries that 
<q>unpack</q> complex type
+      columns often use correlated subqueries in the <codeph>FROM</codeph> 
clause.
+      For example, if the first table in the join clause is 
<codeph>CUSTOMER</codeph>, the second
+      join clause might have a subquery that selects from the column 
<codeph>CUSTOMER.C_ORDERS</codeph>,
+      which is an <codeph>ARRAY</codeph>. The subquery re-evaluates the 
<codeph>ARRAY</codeph> elements
+      corresponding to each row from the <codeph>CUSTOMER</codeph> table.
+      See <xref href="impala_complex_types.xml#complex_types"/> for details 
and examples of
+      using subqueries with complex types.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_operators.xml#exists"/>, <xref 
href="impala_operators.xml#in"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_sum.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_sum.xml b/docs/topics/impala_sum.xml
new file mode 100644
index 0000000..6d25f1c
--- /dev/null
+++ b/docs/topics/impala_sum.xml
@@ -0,0 +1,236 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="sum">
+
+  <title>SUM Function</title>
+  <titlealts><navtitle>SUM</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="Analytic Functions"/>
+      <data name="Category" value="Aggregate Functions"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">sum() function</indexterm>
+      An aggregate function that returns the sum of a set of numbers. Its 
single argument can be numeric column, or
+      the numeric result of a function or expression applied to the column 
value. Rows with a <codeph>NULL</codeph>
+      value for the specified column are ignored. If the table is empty, or 
all the values supplied to
+      <codeph>MIN</codeph> are <codeph>NULL</codeph>, <codeph>SUM</codeph> 
returns <codeph>NULL</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SUM([DISTINCT | ALL] <varname>expression</varname>) [OVER 
(<varname>analytic_clause</varname>)]</codeblock>
+
+    <p>
+      When the query contains a <codeph>GROUP BY</codeph> clause, returns one 
value for each combination of
+      grouping values.
+    </p>
+
+    <p>
+      <b>Return type:</b> <codeph>BIGINT</codeph> for integer arguments, 
<codeph>DOUBLE</codeph> for floating-point
+      arguments
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/>
+    
+    <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how to use <codeph>SUM()</codeph> to compute 
the total for all the values in the
+      table, a subset of values, or the sum for each combination of values in 
the <codeph>GROUP BY</codeph> clause:
+    </p>
+
+<codeblock>-- Total all the values for this column in the table.
+select sum(c1) from t1;
+-- Find the total for this column from a subset of the table.
+select sum(c1) from t1 where month = 'January' and year = '2013';
+-- Find the total from a set of numeric function results.
+select sum(length(s)) from t1;
+-- Often used with functions that return predefined values to compute a score.
+select sum(case when grade = 'A' then 1.0 when grade = 'B' then 0.75 else 0) 
as class_honors from test_scores;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, sum(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select sum(distinct x) from t1;
+</codeblock>
+
+    <p rev="2.0.0">
+      The following examples show how to use <codeph>SUM()</codeph> in an 
analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <codeph>SUM()</codeph> 
is reported for each input value, as
+      opposed to the <codeph>GROUP BY</codeph> clause which condenses the 
result set.
+<codeblock>select x, property, sum(x) <b>over (partition by property)</b> as 
sum from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | sum |
++----+----------+-----+
+| 2  | even     | 30  |
+| 4  | even     | 30  |
+| 6  | even     | 30  |
+| 8  | even     | 30  |
+| 10 | even     | 30  |
+| 1  | odd      | 25  |
+| 3  | odd      | 25  |
+| 5  | odd      | 25  |
+| 7  | odd      | 25  |
+| 9  | odd      | 25  |
++----+----------+-----+
+</codeblock>
+
+Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results 
that are cumulative or apply to a moving
+set of rows (the <q>window</q>). The following examples use 
<codeph>SUM()</codeph> in an analytic context
+(that is, with an <codeph>OVER()</codeph> clause) to produce a running total 
of all the even values,
+then a running total of all the odd values. The basic <codeph>ORDER BY 
x</codeph> clause implicitly
+activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW</codeph>,
+which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW</codeph>,
+therefore all of these examples produce the same results:
+<codeblock>select x, property,
+  sum(x) over (partition by property <b>order by x</b>) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 2  | even     | 2                |
+| 4  | even     | 6                |
+| 6  | even     | 12               |
+| 8  | even     | 20               |
+| 10 | even     | 30               |
+| 1  | odd      | 1                |
+| 3  | odd      | 4                |
+| 5  | odd      | 9                |
+| 7  | odd      | 16               |
+| 9  | odd      | 25               |
++----+----------+------------------+
+
+select x, property,
+  sum(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>range between unbounded preceding and current row</b>
+  ) as 'cumulative total'
+from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 2  | even     | 2                |
+| 4  | even     | 6                |
+| 6  | even     | 12               |
+| 8  | even     | 20               |
+| 10 | even     | 30               |
+| 1  | odd      | 1                |
+| 3  | odd      | 4                |
+| 5  | odd      | 9                |
+| 7  | odd      | 16               |
+| 9  | odd      | 25               |
++----+----------+------------------+
+
+select x, property,
+  sum(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>rows between unbounded preceding and current row</b>
+  ) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 2  | even     | 2                |
+| 4  | even     | 6                |
+| 6  | even     | 12               |
+| 8  | even     | 20               |
+| 10 | even     | 30               |
+| 1  | odd      | 1                |
+| 3  | odd      | 4                |
+| 5  | odd      | 9                |
+| 7  | odd      | 16               |
+| 9  | odd      | 25               |
++----+----------+------------------+
+</codeblock>
+
+Changing the direction of the <codeph>ORDER BY</codeph> clause causes the 
intermediate
+results of the cumulative total to be calculated in a different order:
+
+<codeblock>select sum(x) over (partition by property <b>order by x desc</b>) 
as 'cumulative total'  
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 10 | even     | 10               |
+| 8  | even     | 18               |
+| 6  | even     | 24               |
+| 4  | even     | 28               |
+| 2  | even     | 30               |
+| 9  | odd      | 9                |
+| 7  | odd      | 16               |
+| 5  | odd      | 21               |
+| 3  | odd      | 24               |
+| 1  | odd      | 25               |
++----+----------+------------------+
+</codeblock>
+
+The following examples show how to construct a moving window, with a running 
total taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even 
values or all the odd values).
+Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this 
type of
+moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but 
not the <codeph>RANGE BETWEEN</codeph>
+clause:
+<codeblock>select x, property,
+  sum(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>rows between 1 preceding and 1 following</b>
+  ) as 'moving total'
+  from int_t where property in ('odd','even');
++----+----------+--------------+
+| x  | property | moving total |
++----+----------+--------------+
+| 2  | even     | 6            |
+| 4  | even     | 12           |
+| 6  | even     | 18           |
+| 8  | even     | 24           |
+| 10 | even     | 18           |
+| 1  | odd      | 4            |
+| 3  | odd      | 9            |
+| 5  | odd      | 15           |
+| 7  | odd      | 21           |
+| 9  | odd      | 16           |
++----+----------+--------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  sum(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>range between 1 preceding and 1 following</b>
+  ) as 'moving total'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and 
upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</codeblock>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_analytic_functions.xml#analytic_functions"/>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+<!-- This conref appears under SUM(), AVG(), FLOAT, and DOUBLE topics. -->
+
+    <p conref="../shared/impala_common.xml#common/sum_double"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_support_start_over.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_support_start_over.xml 
b/docs/topics/impala_support_start_over.xml
new file mode 100644
index 0000000..2c17b5d
--- /dev/null
+++ b/docs/topics/impala_support_start_over.xml
@@ -0,0 +1,29 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="support_start_over">
+
+  <title>SUPPORT_START_OVER Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SUPPORT_START_OVER query 
option</indexterm>
+      Leave this setting at its default value.
+      It is a read-only setting, tested by some client applications such as 
Hue.
+    </p>
+    <p>
+      If you accidentally change it through <cmdname>impala-shell</cmdname>,
+      subsequent queries encounter errors until you undo the change
+      by issuing <codeph>UNSET support_start_over</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_sync_ddl.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_sync_ddl.xml b/docs/topics/impala_sync_ddl.xml
new file mode 100644
index 0000000..b217f67
--- /dev/null
+++ b/docs/topics/impala_sync_ddl.xml
@@ -0,0 +1,56 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2.1" id="sync_ddl">
+
+  <title>SYNC_DDL Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SYNC_DDL query option</indexterm>
+      When enabled, causes any DDL operation such as <codeph>CREATE 
TABLE</codeph> or <codeph>ALTER TABLE</codeph>
+      to return only when the changes have been propagated to all other Impala 
nodes in the cluster by the Impala
+      catalog service. That way, if you issue a subsequent 
<codeph>CONNECT</codeph> statement in
+      <cmdname>impala-shell</cmdname> to connect to a different node in the 
cluster, you can be sure that other
+      node will already recognize any added or changed tables. (The catalog 
service automatically broadcasts the
+      DDL changes to all nodes automatically, but without this option there 
could be a period of inconsistency if
+      you quickly switched to another node, such as by issuing a subsequent 
query through a load-balancing proxy.)
+    </p>
+
+    <p>
+      Although <codeph>INSERT</codeph> is classified as a DML statement, when 
the <codeph>SYNC_DDL</codeph> option
+      is enabled, <codeph>INSERT</codeph> statements also delay their 
completion until all the underlying data and
+      metadata changes are propagated to all Impala nodes. Internally, Impala 
inserts have similarities with DDL
+      statements in traditional database systems, because they create metadata 
needed to track HDFS block locations
+      for new files and they potentially add new partitions to partitioned 
tables.
+    </p>
+
+    <note>
+      Because this option can introduce a delay after each write operation, if 
you are running a sequence of
+      <codeph>CREATE DATABASE</codeph>, <codeph>CREATE TABLE</codeph>, 
<codeph>ALTER TABLE</codeph>,
+      <codeph>INSERT</codeph>, and similar statements within a setup script, 
to minimize the overall delay you can
+      enable the <codeph>SYNC_DDL</codeph> query option only near the end, 
before the final DDL statement.
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <draft-comment translate="no">
+Example could be useful here.
+</draft-comment>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_ddl.xml#ddl"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_tables.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tables.xml b/docs/topics/impala_tables.xml
new file mode 100644
index 0000000..30e3737
--- /dev/null
+++ b/docs/topics/impala_tables.xml
@@ -0,0 +1,258 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="tables">
+
+  <title>Overview of Impala Tables</title>
+  <titlealts><navtitle>Tables</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p/>
+
+    <p>
+      Tables are the primary containers for data in Impala. They have the 
familiar row and column layout similar to
+      other database systems, plus some features such as partitioning often 
associated with higher-end data
+      warehouse systems.
+    </p>
+
+    <p>
+      Logically, each table has a structure based on the definition of its 
columns, partitions, and other
+      properties.
+    </p>
+
+    <p>
+      Physically, each table that uses HDFS storage is associated with a 
directory in HDFS. The table data consists of all the data files
+      underneath that directory:
+    </p>
+
+    <ul>
+      <li>
+        <xref href="impala_tables.xml#internal_tables">Internal tables</xref> 
are managed by Impala, and use directories
+        inside the designated Impala work area.
+      </li>
+
+      <li>
+        <xref href="impala_tables.xml#external_tables">External tables</xref> 
use arbitrary HDFS directories, where
+        the data files are typically shared between different Hadoop 
components.
+      </li>
+
+      <li>
+        Large-scale data is usually handled by partitioned tables, where the 
data files are divided among different
+        HDFS subdirectories.
+      </li>
+    </ul>
+
+    <p rev="2.2.0">
+      Impala tables can also represent data that is stored in HBase, or in the 
Amazon S3 filesystem (CDH 5.4.0 or higher),
+      or on Isilon storage devices (CDH 5.4.3 or higher).  See <xref 
href="impala_hbase.xml#impala_hbase"/>,
+      <xref href="impala_s3.xml#s3"/>, and <xref 
href="impala_isilon.xml#impala_isilon"/>
+      for details about those special kinds of tables.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
+
+    <p>
+      <b>Related statements:</b> <xref 
href="impala_create_table.xml#create_table"/>,
+      <xref href="impala_drop_table.xml#drop_table"/>, <xref 
href="impala_alter_table.xml#alter_table"/>
+      <xref href="impala_insert.xml#insert"/>, <xref 
href="impala_load_data.xml#load_data"/>,
+      <xref href="impala_select.xml#select"/>
+    </p>
+  </conbody>
+
+  <concept id="internal_tables">
+
+    <title>Internal Tables</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">internal tables</indexterm>
+        The default kind of table produced by the <codeph>CREATE 
TABLE</codeph> statement is known as an internal
+        table. (Its counterpart is the external table, produced by the 
<codeph>CREATE EXTERNAL TABLE</codeph>
+        syntax.)
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Impala creates a directory in HDFS to hold the data files.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can create data in internal tables by issuing 
<codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
+            statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If you add or replace data using HDFS operations, issue the 
<codeph>REFRESH</codeph> command in
+            <cmdname>impala-shell</cmdname> so that Impala recognizes the 
changes in data files, block locations,
+            and so on.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you issue a <codeph>DROP TABLE</codeph> statement, Impala 
physically removes all the data files
+            from the directory.
+          </p>
+        </li>
+
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/check_internal_external_table"/>
+        </li>
+
+        <li>
+          <p>
+            When you issue an <codeph>ALTER TABLE</codeph> statement to rename 
an internal table, all data files
+            are moved into the new HDFS directory for the table. The files are 
moved even if they were formerly in
+            a directory outside the Impala data directory, for example in an 
internal table with a
+            <codeph>LOCATION</codeph> attribute pointing to an outside HDFS 
directory.
+          </p>
+        </li>
+      </ul>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/switch_internal_external_table"/>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_tables.xml#external_tables"/>, <xref 
href="impala_create_table.xml#create_table"/>,
+        <xref href="impala_drop_table.xml#drop_table"/>, <xref 
href="impala_alter_table.xml#alter_table"/>,
+        <xref href="impala_describe.xml#describe"/>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="external_tables">
+
+    <title>External Tables</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">external tables</indexterm>
+        The syntax <codeph>CREATE EXTERNAL TABLE</codeph> sets up an Impala 
table that points at existing data
+        files, potentially in HDFS locations outside the normal Impala data 
directories.. This operation saves the
+        expense of importing the data into a new table when you already have 
the data files in a known location in
+        HDFS, in the desired file format.
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            You can use Impala to query the data in this table.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can create data in external tables by issuing 
<codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
+            statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If you add or replace data using HDFS operations, issue the 
<codeph>REFRESH</codeph> command in
+            <cmdname>impala-shell</cmdname> so that Impala recognizes the 
changes in data files, block locations,
+            and so on.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            When you issue a <codeph>DROP TABLE</codeph> statement in Impala, 
that removes the connection that
+            Impala has with the associated data files, but does not physically 
remove the underlying data. You can
+            continue to use the data files with other Hadoop components and 
HDFS operations.
+          </p>
+        </li>
+
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/check_internal_external_table"/>
+        </li>
+
+        <li>
+          <p>
+            When you issue an <codeph>ALTER TABLE</codeph> statement to rename 
an external table, all data files
+            are left in their original locations.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            You can point multiple external tables at the same HDFS directory 
by using the same
+            <codeph>LOCATION</codeph> attribute for each one. The tables could 
have different column definitions,
+            as long as the number and types of columns are compatible with the 
schema evolution considerations for
+            the underlying file type. For example, for text data files, one 
table might define a certain column as
+            a <codeph>STRING</codeph> while another defines the same column as 
a <codeph>BIGINT</codeph>.
+          </p>
+        </li>
+      </ul>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/switch_internal_external_table"/>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_tables.xml#internal_tables"/>, <xref 
href="impala_create_table.xml#create_table"/>,
+        <xref href="impala_drop_table.xml#drop_table"/>, <xref 
href="impala_alter_table.xml#alter_table"/>,
+        <xref href="impala_describe.xml#describe"/>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="table_file_formats">
+    <title>File Formats</title>
+    <conbody>
+      <p>
+        Each table has an associated file format, which determines how Impala 
interprets the
+        associated data files. See <xref 
href="impala_file_formats.xml#file_formats"/> for details.
+      </p>
+      <p>
+        You set the file format during the <codeph>CREATE TABLE</codeph> 
statement,
+        or change it later using the <codeph>ALTER TABLE</codeph> statement.
+        Partitioned tables can have a different file format for individual 
partitions,
+        allowing you to change the file format used in your ETL process for 
new data
+        without going back and reconverting all the existing data in the same 
table.
+      </p>
+      <p>
+        Any <codeph>INSERT</codeph> statements produce new data files with the 
current file format of the table.
+        For existing data files, changing the file format of the table does 
not automatically do any data conversion.
+        You must use <codeph>TRUNCATE TABLE</codeph> or <codeph>INSERT 
OVERWRITE</codeph> to remove any previous data
+        files that use the old file format.
+        Then you use the <codeph>LOAD DATA</codeph> statement, <codeph>INSERT 
... SELECT</codeph>, or other mechanism
+        to put data files of the correct format into the table.
+      </p>
+      <p>
+        The default file format, text, is the most flexible and easy to 
produce when you are just getting started with
+        Impala. The Parquet file format offers the highest query performance 
and uses compression to reduce storage
+        requirements; therefore, Cloudera recommends using Parquet for Impala 
tables with substantial amounts of data.
+        <ph rev="2.3.0">Also, the complex types (<codeph>ARRAY</codeph>, 
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>)
+        available in CDH 5.5 / Impala 2.3 and higher are currently only 
supported with the Parquet file type.</ph>
+        Based on your existing ETL workflow, you might use other file formats 
such as Avro, possibly doing a final
+        conversion step to Parquet to take advantage of its performance for 
analytic queries.
+      </p>
+    </conbody>
+  </concept>
+
+</concept>

[04/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

Reply via email to