[15/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

jrussell Tue, 26 Jul 2016 16:05:17 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_function.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_function.xml 
b/docs/topics/impala_create_function.xml
new file mode 100644
index 0000000..4140289
--- /dev/null
+++ b/docs/topics/impala_create_function.xml
@@ -0,0 +1,291 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="create_function">
+
+  <title>CREATE FUNCTION Statement</title>
+  <titlealts><navtitle>CREATE FUNCTION</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="UDFs"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE FUNCTION statement</indexterm>
+      Creates a user-defined function (UDF), which you can use to implement 
custom logic during
+      <codeph>SELECT</codeph> or <codeph>INSERT</codeph> operations.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      The syntax is different depending on whether you create a scalar UDF, 
which is called once for each row and
+      implemented by a single function, or a user-defined aggregate function 
(UDA), which is implemented by
+      multiple functions that compute intermediate results across sets of rows.
+    </p>
+
+    <p>
+      To create a scalar UDF, issue a <codeph>CREATE FUNCTION</codeph> 
statement:
+    </p>
+
+<codeblock>CREATE FUNCTION [IF NOT EXISTS] 
[<varname>db_name</varname>.]<varname>function_name</varname>([<varname>arg_type</varname>[,
 <varname>arg_type</varname>...])
+  RETURNS <varname>return_type</varname>
+  LOCATION '<varname>hdfs_path</varname>'
+  SYMBOL='<varname>symbol_or_class</varname>'</codeblock>
+
+    <p>
+      To create a UDA, issue a <codeph>CREATE AGGREGATE FUNCTION</codeph> 
statement:
+    </p>
+
+<codeblock>CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] 
[<varname>db_name</varname>.]<varname>function_name</varname>([<varname>arg_type</varname>[,
 <varname>arg_type</varname>...])
+  RETURNS <varname>return_type</varname>
+  LOCATION '<varname>hdfs_path</varname>'
+  [INIT_FN='<varname>function</varname>]
+  UPDATE_FN='<varname>function</varname>
+  MERGE_FN='<varname>function</varname>
+  [PREPARE_FN='<varname>function</varname>]
+  [CLOSEFN='<varname>function</varname>]
+  <ph rev="2.0.0">[SERIALIZE_FN='<varname>function</varname>]</ph>
+  [FINALIZE_FN='<varname>function</varname>]
+<!--  [INTERMEDIATE <varname>type_spec</varname>] --></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p>
+      <b>Varargs notation:</b>
+    </p>
+
+    <p>
+      If the underlying implementation of your function accepts a variable 
number of arguments:
+    </p>
+
+    <ul>
+      <li>
+        The variable arguments must go last in the argument list.
+      </li>
+
+      <li>
+        The variable arguments must all be of the same type.
+      </li>
+
+      <li>
+        You must include at least one instance of the variable arguments in 
every function call invoked from SQL.
+      </li>
+
+      <li>
+        You designate the variable portion of the argument list in the 
<codeph>CREATE FUNCTION</codeph> statement
+        by including <codeph>...</codeph> immediately after the type name of 
the first variable argument. For
+        example, to create a function that accepts an <codeph>INT</codeph> 
argument, followed by a
+        <codeph>BOOLEAN</codeph>, followed by one or more 
<codeph>STRING</codeph> arguments, your <codeph>CREATE
+        FUNCTION</codeph> statement would look like:
+<codeblock>CREATE FUNCTION <varname>func_name</varname> (INT, BOOLEAN, STRING 
...)
+  RETURNS <varname>type</varname> LOCATION '<varname>path</varname>' 
SYMBOL='<varname>entry_point</varname>';
+</codeblock>
+      </li>
+    </ul>
+
+    <p>
+      See <xref href="impala_udf.xml#udf_varargs"/> for how to code the C++ or 
Java function to accept
+      variable-length argument lists.
+    </p>
+
+    <p>
+      <b>Scalar and aggregate functions:</b>
+    </p>
+
+    <p>
+      The simplest kind of user-defined function returns a single scalar value 
each time it is called, typically
+      once for each row in the result set. This general kind of function is 
what is usually meant by UDF.
+      User-defined aggregate functions (UDAs) are a specialized kind of UDF 
that produce a single value based on
+      the contents of multiple rows. You usually use UDAs in combination with 
a <codeph>GROUP BY</codeph> clause to
+      condense a large result set into a smaller one, or even a single row 
summarizing column values across an
+      entire table.
+    </p>
+
+    <p>
+      You create UDAs by using the <codeph>CREATE AGGREGATE FUNCTION</codeph> 
syntax. The clauses
+      <codeph>INIT_FN</codeph>, <codeph>UPDATE_FN</codeph>, 
<codeph>MERGE_FN</codeph>,
+      <ph rev="2.0.0"><codeph>SERIALIZE_FN</codeph>,</ph> 
<codeph>FINALIZE_FN</codeph>, and
+      <codeph>INTERMEDIATE</codeph> only apply when you create a UDA rather 
than a scalar UDF.
+    </p>
+
+    <p>
+      The <codeph>*_FN</codeph> clauses specify functions to call at different 
phases of function processing.
+    </p>
+
+    <ul>
+      <li>
+        <b>Initialize:</b> The function you specify with the 
<codeph>INIT_FN</codeph> clause does any initial
+        setup, such as initializing member variables in internal data 
structures. This function is often a stub for
+        simple UDAs. You can omit this clause and a default (no-op) function 
will be used.
+      </li>
+
+      <li>
+        <b>Update:</b> The function you specify with the 
<codeph>UPDATE_FN</codeph> clause is called once for each
+        row in the original result set, that is, before any <codeph>GROUP 
BY</codeph> clause is applied. A separate
+        instance of the function is called for each different value returned 
by the <codeph>GROUP BY</codeph>
+        clause. The final argument passed to this function is a pointer, to 
which you write an updated value based
+        on its original value and the value of the first argument.
+      </li>
+
+      <li>
+        <b>Merge:</b> The function you specify with the 
<codeph>MERGE_FN</codeph> clause is called an arbitrary
+        number of times, to combine intermediate values produced by different 
nodes or different threads as Impala
+        reads and processes data files in parallel. The final argument passed 
to this function is a pointer, to
+        which you write an updated value based on its original value and the 
value of the first argument.
+      </li>
+
+      <li rev="2.0.0">
+        <b>Serialize:</b> The function you specify with the 
<codeph>SERIALIZE_FN</codeph> clause frees memory
+        allocated to intermediate results. It is required if any memory was 
allocated by the Allocate function in
+        the Init, Update, or Merge functions, or if the intermediate type 
contains any pointers. See
+        <xref 
href="https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc"; 
scope="external" format="html">the
+        UDA code samples</xref> for details.
+      </li>
+
+      <li>
+        <b>Finalize:</b> The function you specify with the 
<codeph>FINALIZE_FN</codeph> clause does any required
+        teardown for resources acquired by your UDF, such as freeing memory, 
closing file handles if you explicitly
+        opened any files, and so on. This function is often a stub for simple 
UDAs. You can omit this clause and a
+        default (no-op) function will be used. It is required in UDAs where 
the final return type is different than
+        the intermediate type. or if any memory was allocated by the Allocate 
function in the Init, Update, or
+        Merge functions. See
+        <xref 
href="https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc"; 
scope="external" format="html">the
+        UDA code samples</xref> for details.
+      </li>
+    </ul>
+
+    <p>
+      If you use a consistent naming convention for each of the underlying 
functions, Impala can automatically
+      determine the names based on the first such clause, so the others are 
optional.
+    </p>
+
+    <p audience="Cloudera">
+      The <codeph>INTERMEDIATE</codeph> clause specifies the data type of 
intermediate values passed from the
+      <q>update</q> phase to the <q>merge</q> phase, and from the <q>merge</q> 
phase to the <q>finalize</q> phase.
+      You can use any of the existing Impala data types, or the special 
notation
+      <codeph>CHAR(<varname>n</varname>)</codeph> to allocate a scratch area 
of <varname>n</varname> bytes for the
+      intermediate result. For example, if the different phases of your UDA 
pass strings to each other but in the
+      end the function returns a <codeph>BIGINT</codeph> value, you would 
specify <codeph>INTERMEDIATE
+      STRING</codeph>. Likewise, if the different phases of your UDA pass 2 
separate <codeph>BIGINT</codeph> values
+      between them (8 bytes each), you would specify <codeph>INTERMEDIATE 
CHAR(16)</codeph> so that each function
+      could read from and write to a 16-byte buffer.
+    </p>
+
+    <p>
+      For end-to-end examples of UDAs, see <xref href="impala_udf.xml#udfs"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/udfs_no_complex_types"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <ul>
+      <li>
+        You can write Impala UDFs in either C++ or Java. C++ UDFs are new to 
Impala, and are the recommended format
+        for high performance utilizing native code. Java-based UDFs are 
compatible between Impala and Hive, and are
+        most suited to reusing existing Hive UDFs. (Impala can run Java-based 
Hive UDFs but not Hive UDAs.)
+      </li>
+
+      <li>
+        The body of the UDF is represented by a <codeph>.so</codeph> or 
<codeph>.jar</codeph> file, which you store
+        in HDFS and the <codeph>CREATE FUNCTION</codeph> statement distributes 
to each Impala node.
+      </li>
+
+      <li>
+        Impala calls the underlying code during SQL statement evaluation, as 
many times as needed to process all
+        the rows from the result set. All UDFs are assumed to be 
deterministic, that is, to always return the same
+        result when passed the same argument values. Impala might or might not 
skip some invocations of a UDF if
+        the result value is already known from a previous call. Therefore, do 
not rely on the UDF being called a
+        specific number of times, and do not return different result values 
based on some external factor such as
+        the current time, a random number function, or an external data source 
that could be updated while an
+        Impala query is in progress.
+      </li>
+
+      <li>
+        The names of the function arguments in the UDF are not significant, 
only their number, positions, and data
+        types.
+      </li>
+
+      <li>
+        You can overload the same function name by creating multiple versions 
of the function, each with a
+        different argument signature. For security reasons, you cannot make a 
UDF with the same name as any
+        built-in function.
+      </li>
+
+      <li>
+        In the UDF code, you represent the function return result as a 
<codeph>struct</codeph>. This
+        <codeph>struct</codeph> contains 2 fields. The first field is a 
<codeph>boolean</codeph> representing
+        whether the value is <codeph>NULL</codeph> or not. (When this field is 
<codeph>true</codeph>, the return
+        value is interpreted as <codeph>NULL</codeph>.) The second field is 
the same type as the specified function
+        return type, and holds the return value when the function returns 
something other than
+        <codeph>NULL</codeph>.
+      </li>
+
+      <li>
+        In the UDF code, you represent the function arguments as an initial 
pointer to a UDF context structure,
+        followed by references to zero or more <codeph>struct</codeph>s, 
corresponding to each of the arguments.
+        Each <codeph>struct</codeph> has the same 2 fields as with the return 
value, a <codeph>boolean</codeph>
+        field representing whether the argument is <codeph>NULL</codeph>, and 
a field of the appropriate type
+        holding any non-<codeph>NULL</codeph> argument value.
+      </li>
+
+      <li>
+        For sample code and build instructions for UDFs,
+        see <xref 
href="https://github.com/cloudera/impala/tree/master/be/src/udf_samples"; 
scope="external" format="html">the sample UDFs in the Impala github repo</xref>.
+      </li>
+
+      <li>
+        Because the file representing the body of the UDF is stored in HDFS, 
it is automatically available to all
+        the Impala nodes. You do not need to manually copy any UDF-related 
files between servers.
+      </li>
+
+      <li>
+        Because Impala currently does not have any <codeph>ALTER 
FUNCTION</codeph> statement, if you need to rename
+        a function, move it to a different database, or change its signature 
or other properties, issue a
+        <codeph>DROP FUNCTION</codeph> statement for the original function 
followed by a <codeph>CREATE
+        FUNCTION</codeph> with the desired properties.
+      </li>
+
+      <li>
+        Because each UDF is associated with a particular database, either 
issue a <codeph>USE</codeph> statement
+        before doing any <codeph>CREATE FUNCTION</codeph> statements, or 
specify the name of the function as
+        
<codeph><varname>db_name</varname>.<varname>function_name</varname></codeph>.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      Impala can run UDFs that were created through Hive, as long as they 
refer to Impala-compatible data types
+      (not composite or nested column types). Hive can run Java-based UDFs 
that were created through Impala, but
+      not Impala UDFs written in C++.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p 
conref="../shared/impala_common.xml#common/udf_persistence_restriction"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_udf.xml#udfs"/> for more background information, 
usage instructions, and examples for
+      Impala UDFs; <xref href="impala_drop_function.xml#drop_function"/>
+    </p>
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_role.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_role.xml 
b/docs/topics/impala_create_role.xml
new file mode 100644
index 0000000..975ce15
--- /dev/null
+++ b/docs/topics/impala_create_role.xml
@@ -0,0 +1,66 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.4.0" id="create_role">
+
+  <title>CREATE ROLE Statement (CDH 5.2 or higher only)</title>
+  <titlealts><navtitle>CREATE ROLE (CDH 5.2 or higher 
only)</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Roles"/>
+      <!-- Consider whether to go deeper into categories like Security for the 
Sentry-related statements. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE ROLE statement</indexterm>
+<!-- Copied from Sentry docs. Turn into conref. -->
+      The <codeph>CREATE ROLE</codeph> statement creates a role to which 
privileges can be granted. Privileges can
+      be granted to roles, which can then be assigned to users. A user that 
has been assigned a role will only be
+      able to exercise the privileges of that role. Only users that have 
administrative privileges can create/drop
+      roles. By default, the <codeph>hive</codeph>, <codeph>impala</codeph> 
and <codeph>hue</codeph> users have
+      administrative privileges in Sentry.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE ROLE <varname>role_name</varname>
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/privileges_blurb"/>
+
+    <p>
+      Only administrative users (those with <codeph>ALL</codeph> privileges on 
the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      Impala makes use of any roles and privileges specified by the 
<codeph>GRANT</codeph> and
+      <codeph>REVOKE</codeph> statements in Hive, and Hive makes use of any 
roles and privileges specified by the
+      <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala. 
The Impala <codeph>GRANT</codeph>
+      and <codeph>REVOKE</codeph> statements for privileges do not require the 
<codeph>ROLE</codeph> keyword to be
+      repeated before each role name, unlike the equivalent Hive statements.
+    </p>
+
+<!-- To do: nail down the new SHOW syntax, e.g. SHOW ROLES, SHOW CURRENT 
ROLES, SHOW GROUPS. -->
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_authorization.xml#authorization"/>, <xref 
href="impala_grant.xml#grant"/>,
+      <xref href="impala_revoke.xml#revoke"/>, <xref 
href="impala_drop_role.xml#drop_role"/>,
+      <xref href="impala_show.xml#show"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_table.xml 
b/docs/topics/impala_create_table.xml
new file mode 100644
index 0000000..cdaee4a
--- /dev/null
+++ b/docs/topics/impala_create_table.xml
@@ -0,0 +1,650 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="create_table">
+
+  <title>CREATE TABLE Statement</title>
+  <titlealts><navtitle>CREATE TABLE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="HDFS Caching"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Schemas"/>
+      <data audience="impala_next" name="Category" value="Kudu"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE TABLE statement</indexterm>
+      Creates a new table and specifies its characteristics. While creating a 
table, you optionally specify aspects
+      such as:
+    </p>
+
+    <ul>
+      <li>
+        Whether the table is internal or external.
+      </li>
+
+      <li>
+        The columns and associated data types.
+      </li>
+
+      <li>
+        The columns used for physically partitioning the data.
+      </li>
+
+      <li>
+        The file format for data files.
+      </li>
+
+      <li>
+        The HDFS directory where the data files are located.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+    <p>
+      The general syntax for creating a table and specifying its columns is as 
follows:
+    </p>
+
+    <p>
+      <b>Explicit column definitions:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] 
[<varname>db_name</varname>.]<varname>table_name</varname>
+  [(<varname>col_name</varname> <varname>data_type</varname> [COMMENT 
'<varname>col_comment</varname>'], ...)]
+  [COMMENT '<varname>table_comment</varname>']
+  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> 
[COMMENT '<varname>col_comment</varname>'], ...)]
+  [WITH SERDEPROPERTIES 
('<varname>key1</varname>'='<varname>value1</varname>', 
'<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] [STORED AS 
<varname>file_format</varname>]
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', 
'<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph 
rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+</codeblock>
+
+    <p>
+      <b>Column definitions inferred from data file:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] 
[<varname>db_name</varname>.]<varname>table_name</varname>
+  LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'
+  [COMMENT '<varname>table_comment</varname>']
+  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> 
[COMMENT '<varname>col_comment</varname>'], ...)]
+  [WITH SERDEPROPERTIES 
('<varname>key1</varname>'='<varname>value1</varname>', 
'<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] [STORED AS 
<varname>file_format</varname>]
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', 
'<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph 
rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+data_type:
+    <varname>primitive_type</varname>
+  | array_type
+  | map_type
+  | struct_type
+</codeblock>
+
+    <p>
+      <b>CREATE TABLE AS SELECT:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] 
<varname>db_name</varname>.]<varname>table_name</varname>
+  [COMMENT '<varname>table_comment</varname>']
+  [WITH SERDEPROPERTIES 
('<varname>key1</varname>'='<varname>value1</varname>', 
'<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] [STORED AS 
<varname>file_format</varname>]
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', 
'<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph 
rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+AS
+  <varname>select_statement</varname></codeblock>
+
+<codeblock>primitive_type:
+    TINYINT
+  | SMALLINT
+  | INT
+  | BIGINT
+  | BOOLEAN
+  | FLOAT
+  | DOUBLE
+  <ph rev="1.4.0">| DECIMAL</ph>
+  | STRING
+  <ph rev="2.0.0">| CHAR</ph>
+  <ph rev="2.0.0">| VARCHAR</ph>
+  | TIMESTAMP
+
+<ph rev="2.3.0">complex_type:
+    struct_type
+  | array_type
+  | map_type
+
+struct_type: STRUCT &lt; <varname>name</varname> : 
<varname>primitive_or_complex_type</varname> [COMMENT 
'<varname>comment_string</varname>'], ... &gt;
+
+array_type: ARRAY &lt; <varname>primitive_or_complex_type</varname> &gt;
+
+map_type: MAP &lt; <varname>primitive_type</varname>, 
<varname>primitive_or_complex_type</varname> &gt;
+</ph>
+row_format:
+  DELIMITED [FIELDS TERMINATED BY '<varname>char</varname>' [ESCAPED BY 
'<varname>char</varname>']]
+  [LINES TERMINATED BY '<varname>char</varname>']
+
+file_format:
+    PARQUET
+  | TEXTFILE
+  | AVRO
+  | SEQUENCEFILE
+  | RCFILE
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <!-- Should really have some info up front about all the data types and 
file formats.
+         Consider adding here, or at least making inline links to the relevant 
keywords
+         in the syntax spec above. -->
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      The Impala complex types (<codeph>STRUCT</codeph>, 
<codeph>ARRAY</codeph>, or <codeph>MAP</codeph>)
+      are available in CDH 5.5 / Impala 2.3 and higher.
+      Because you can nest these types (for example, to make an array of maps 
or a struct
+      with an array field), these types are also sometimes referred to as 
nested types.
+      See <xref href="impala_complex_types.xml#complex_types"/> for usage 
details.
+    </p>
+
+    <!-- This is kind of an obscure and rare usage scenario. Consider moving 
all the complex type stuff further down
+         after some of the more common clauses. -->
+    <p rev="2.3.0">
+      Impala can create tables containing complex type columns, with any 
supported file format.
+      Because currently Impala can only query complex type columns in Parquet 
tables, creating
+      tables with complex type columns and other file formats such as text is 
of limited use.
+      For example, you might create a text table including some columns with 
complex types with Impala, and use Hive
+      as part of your to ingest the nested type data and copy it to an 
identical Parquet table.
+      Or you might create a partitioned table containing complex type columns 
using one file format, and
+      use <codeph>ALTER TABLE</codeph> to change the file format of individual 
partitions to Parquet; Impala
+      can then query only the Parquet-format partitions in that table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_partitioning"/>
+
+    <p>
+      <b>Internal and external tables (EXTERNAL and LOCATION clauses):</b>
+    </p>
+
+    <p>
+      By default, Impala creates an <q>internal</q> table, where Impala 
manages the underlying data files for the
+      table, and physically deletes the data files when you drop the table. If 
you specify the
+      <codeph>EXTERNAL</codeph> clause, Impala treats the table as an 
<q>external</q> table, where the data files
+      are typically produced outside Impala and queried from their original 
locations in HDFS, and Impala leaves
+      the data files in place when you drop the table. For details about 
internal and external tables, see
+      <xref href="impala_tables.xml#tables"/>.
+    </p>
+
+    <p>
+      Typically, for an external table you include a <codeph>LOCATION</codeph> 
clause to specify the path to the
+      HDFS directory where Impala reads and writes files for the table. For 
example, if your data pipeline produces
+      Parquet files in the HDFS directory 
<filepath>/user/etl/destination</filepath>, you might create an external
+      table as follows:
+    </p>
+
+<codeblock>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 
TIMESTAMP)
+  STORED AS PARQUET LOCATION '/user/etl/destination';
+</codeblock>
+
+    <p>
+      Although the <codeph>EXTERNAL</codeph> and <codeph>LOCATION</codeph> 
clauses are often specified together,
+      <codeph>LOCATION</codeph> is optional for external tables, and you can 
also specify <codeph>LOCATION</codeph>
+      for internal tables. The difference is all about whether Impala <q>takes 
control</q> of the underlying data
+      files and moves them when you rename the table, or deletes them when you 
drop the table. For more about
+      internal and external tables and how they interact with the 
<codeph>LOCATION</codeph> attribute, see
+      <xref href="impala_tables.xml#tables"/>.
+    </p>
+
+    <p>
+      <b>Partitioned tables (PARTITIONED BY clause):</b>
+    </p>
+
+    <p>
+      The <codeph>PARTITIONED BY</codeph> clause divides the data files based 
on the values from one or more
+      specified columns. Impala queries can use the partition metadata to 
minimize the amount of data that is read
+      from disk or transmitted across the network, particularly during join 
queries. For details about
+      partitioning, see <xref href="impala_partitioning.xml#partitioning"/>.
+    </p>
+
+    <p rev="kudu" audience="impala_next">
+      <b>Partitioning for Kudu tables (DISTRIBUTE BY clause)</b>
+    </p>
+
+    <p rev="kudu" audience="impala_next">
+      For Kudu tables, you specify logical partitioning across one or more 
columns using the
+      <codeph>DISTRIBUTE BY</codeph> clause. In contrast to partitioning for 
HDFS-based tables,
+      multiple values for a partition key column can be located in the same 
partition.
+      The optional <codeph>HASH</codeph> clause lets you divide one or a set 
of partition key columns
+      into a specified number of buckets; you can use more than one 
<codeph>HASH</codeph>
+      clause, specifying a distinct set of partition key columns for each.
+      The optional <codeph>RANGE</codeph> clause further subdivides the 
partitions, based on
+      a set of literal values for the partition key columns.
+    </p>
+
+    <p>
+      <b>Specifying file format (STORED AS and ROW FORMAT clauses):</b>
+    </p>
+
+    <p>
+      The <codeph>STORED AS</codeph> clause identifies the format of the 
underlying data files. Currently, Impala
+      can query more types of file formats than it can create or insert into. 
Use Hive to perform any create or
+      data load operations that are not currently available in Impala. For 
example, Impala can create a
+      SequenceFile table but cannot insert data into it. There are also 
Impala-specific procedures for using
+      compression with each kind of file format. For details about working 
with data files of various formats, see
+      <xref href="impala_file_formats.xml#file_formats"/>.
+    </p>
+
+    <note>
+      In Impala 1.4.0 and higher, Impala can create Avro tables, which 
formerly required doing the <codeph>CREATE
+      TABLE</codeph> statement in Hive. See <xref 
href="impala_avro.xml#avro"/> for details and examples.
+    </note>
+
+    <p>
+      By default (when no <codeph>STORED AS</codeph> clause is specified), 
data files in Impala tables are created
+      as text files with Ctrl-A (hex 01) characters as the delimiter.
+<!-- Verify if ROW FORMAT is entirely ignored outside of text tables, or does 
it apply somehow to SequenceFile and/or RCFile too? -->
+      Specify the <codeph>ROW FORMAT DELIMITED</codeph> clause to produce or 
ingest data files that use a different
+      delimiter character such as tab or <codeph>|</codeph>, or a different 
line end character such as carriage
+      return or newline. When specifying delimiter and line end characters 
with the <codeph>FIELDS TERMINATED
+      BY</codeph> and <codeph>LINES TERMINATED BY</codeph> clauses, use 
<codeph>'\t'</codeph> for tab,
+      <codeph>'\n'</codeph> for newline or linefeed, <codeph>'\r'</codeph> for 
carriage return, and
+      <codeph>\</codeph><codeph>0</codeph> for ASCII <codeph>nul</codeph> (hex 
00). For more examples of text
+      tables, see <xref href="impala_txtfile.xml#txtfile"/>.
+    </p>
+
+    <p>
+      The <codeph>ESCAPED BY</codeph> clause applies both to text files that 
you create through an
+      <codeph>INSERT</codeph> statement to an Impala <codeph>TEXTFILE</codeph> 
table, and to existing data files
+      that you put into an Impala table directory. (You can ingest existing 
data files either by creating the table
+      with <codeph>CREATE EXTERNAL TABLE ... LOCATION</codeph>, the 
<codeph>LOAD DATA</codeph> statement, or
+      through an HDFS operation such as <codeph>hdfs dfs -put 
<varname>file</varname>
+      <varname>hdfs_path</varname></codeph>.) Choose an escape character that 
is not used anywhere else in the
+      file, and put it in front of each instance of the delimiter character 
that occurs within a field value.
+      Surrounding field values with quotation marks does not help Impala to 
parse fields with embedded delimiter
+      characters; the quotation marks are considered to be part of the column 
value. If you want to use
+      <codeph>\</codeph> as the escape character, specify the clause in 
<cmdname>impala-shell</cmdname> as
+      <codeph>ESCAPED BY '\\'</codeph>.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/thorn"/>
+
+    <p>
+      <b>Cloning tables (LIKE clause):</b>
+    </p>
+
+    <p>
+      To create an empty table with the same columns, comments, and other 
attributes as another table, use the
+      following variation. The <codeph>CREATE TABLE ... LIKE</codeph> form 
allows a restricted set of clauses,
+      currently only the <codeph>LOCATION</codeph>, <codeph>COMMENT</codeph>, 
and <codeph>STORED AS</codeph>
+      clauses.
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] 
[<varname>db_name</varname>.]<varname>table_name</varname>
+  <ph rev="1.4.0">LIKE { 
[<varname>db_name</varname>.]<varname>table_name</varname> | PARQUET 
'<varname>hdfs_path_of_parquet_file</varname>' }</ph>
+  [COMMENT '<varname>table_comment</varname>']
+  [STORED AS <varname>file_format</varname>]
+  [LOCATION '<varname>hdfs_path</varname>']</codeblock>
+
+    <note rev="1.2">
+      To clone the structure of a table and transfer data into it in a single 
operation, use the <codeph>CREATE
+      TABLE AS SELECT</codeph> syntax described in the next subsection.
+    </note>
+
+    <p>
+      When you clone the structure of an existing table using the 
<codeph>CREATE TABLE ... LIKE</codeph> syntax,
+      the new table keeps the same file format as the original one, so you 
only need to specify the <codeph>STORED
+      AS</codeph> clause if you want to use a different file format, or when 
specifying a view as the original
+      table. (Creating a table <q>like</q> a view produces a text table by 
default.)
+    </p>
+
+    <p>
+      Although normally Impala cannot create an HBase table directly, Impala 
can clone the structure of an existing
+      HBase table with the <codeph>CREATE TABLE ... LIKE</codeph> syntax, 
preserving the file format and metadata
+      from the original table.
+    </p>
+
+    <p>
+      There are some exceptions to the ability to use <codeph>CREATE TABLE ... 
LIKE</codeph> with an Avro table.
+      For example, you cannot use this technique for an Avro table that is 
specified with an Avro schema but no
+      columns. When in doubt, check if a <codeph>CREATE TABLE ... 
LIKE</codeph> operation works in Hive; if not, it
+      typically will not work in Impala either.
+    </p>
+
+    <p>
+      If the original table is partitioned, the new table inherits the same 
partition key columns. Because the new
+      table is initially empty, it does not inherit the actual partitions that 
exist in the original one. To create
+      partitions in the new table, insert data or issue <codeph>ALTER TABLE 
... ADD PARTITION</codeph> statements.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/create_table_like_view"/>
+
+    <p>
+      Because <codeph>CREATE TABLE ... LIKE</codeph> only manipulates table 
metadata, not the physical data of the
+      table, issue <codeph>INSERT INTO TABLE</codeph> statements afterward to 
copy any data from the original table
+      into the new one, optionally converting the data to a new file format. 
(For some file formats, Impala can do
+      a <codeph>CREATE TABLE ... LIKE</codeph> to create the table, but Impala 
cannot insert data in that file
+      format; in these cases, you must load the data in Hive. See
+      <xref href="impala_file_formats.xml#file_formats"/> for details.)
+    </p>
+
+    <p rev="1.2" id="ctas">
+      <b>CREATE TABLE AS SELECT:</b>
+    </p>
+
+    <p>
+      The <codeph>CREATE TABLE AS SELECT</codeph> syntax is a shorthand 
notation to create a table based on column
+      definitions from another table, and copy data from the source table to 
the destination table without issuing
+      any separate <codeph>INSERT</codeph> statement. This idiom is so popular 
that it has its own acronym,
+      <q>CTAS</q>.
+<!--
+        The <codeph>CREATE TABLE AS SELECT</codeph> syntax is as follows:
+        -->
+    </p>
+
+<!-- CREATE TABLE AS <select> now incorporated up higher in the original 
syntax diagram,
+     thus commented out here.
+     Does CTAS only accept a limited subset of clauses? -->
+
+<!--
+<codeblock rev="1.2">CREATE [EXTERNAL] TABLE [IF NOT EXISTS] 
<varname>db_name</varname>.]<varname>table_name</varname>
+  [COMMENT '<varname>table_comment</varname>']
+  [STORED AS <varname>file_format</varname>]
+  [LOCATION '<varname>hdfs_path</varname>']
+AS
+  <varname>select_statement</varname></codeblock>
+-->
+
+    <p rev="1.2">
+      See <xref href="impala_select.xml#select"/> for details about query 
syntax for the <codeph>SELECT</codeph>
+      portion of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+    </p>
+
+    <p rev="1.2">
+      The newly created table inherits the column names that you select from 
the original table, which you can
+      override by specifying column aliases in the query. Any column or table 
comments from the original table are
+      not carried over to the new table.
+    </p>
+
+    <p rev="obwl" 
conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+    <p rev="1.2">
+      For example, the following statements show how you can clone all the 
data in a table, or a subset of the
+      columns and/or rows, or reorder columns, rename them, or construct them 
out of expressions:
+    </p>
+
+<codeblock rev="1.2">-- Create new table and copy all data.
+CREATE TABLE clone_of_t1 AS SELECT * FROM t1;
+-- Same idea as CREATE TABLE LIKE, don't copy any data.
+CREATE TABLE empty_clone_of_t1 AS SELECT * FROM t1 WHERE 1=0;
+-- Copy some data.
+CREATE TABLE subset_of_t1 AS SELECT * FROM t1 WHERE x &gt; 100 AND y LIKE 'A%';
+CREATE TABLE summary_of_t1 AS SELECT c1, sum(c2) AS total, avg(c2) AS average 
FROM t1 GROUP BY c2;
+-- Switch file format.
+CREATE TABLE parquet_version_of_t1 STORED AS PARQUET AS SELECT * FROM t1;
+-- Create tables with different column order, names, or types than the 
original.
+CREATE TABLE some_columns_from_t1 AS SELECT c1, c3, c5 FROM t1;
+CREATE TABLE reordered_columns_from_t1 AS SELECT c4, c3, c1, c2 FROM t1;
+CREATE TABLE synthesized_columns AS SELECT upper(c1) AS all_caps, c2+c3 AS 
total, "California" AS state FROM t1;</codeblock>
+
+    <p rev="1.2">
+      As part of a CTAS operation, you can convert the data to any file format 
that Impala can write (currently,
+      <codeph>TEXTFILE</codeph> and <codeph>PARQUET</codeph>). You cannot 
specify the lower-level properties of a
+      text table, such as the delimiter. Although you can use a partitioned 
table as the source and copy data from
+      it, you cannot specify any partitioning clauses for the new table.
+    </p>
+
+    <p rev="1.4.0">
+      <b>CREATE TABLE LIKE PARQUET:</b>
+    </p>
+
+    <p rev="1.4.0">
+      The variation <codeph>CREATE TABLE ... LIKE PARQUET 
'<varname>hdfs_path_of_parquet_file</varname>'</codeph>
+      lets you skip the column definitions of the <codeph>CREATE 
TABLE</codeph> statement. The column names and
+      data types are automatically configured based on the organization of the 
specified Parquet data file, which
+      must already reside in HDFS. You can use a data file located outside the 
Impala database directories, or a
+      file from an existing Impala Parquet table; either way, Impala only uses 
the column definitions from the file
+      and does not use the HDFS location for the <codeph>LOCATION</codeph> 
attribute of the new table. (Although
+      you can also specify the enclosing directory with the 
<codeph>LOCATION</codeph> attribute, to both use the
+      same schema as the data file and point the Impala table at the 
associated directory for querying.)
+    </p>
+
+    <p rev="1.4.0">
+      The following considerations apply when you use the <codeph>CREATE TABLE 
LIKE PARQUET</codeph> technique:
+    </p>
+
+    <ul rev="1.4.0">
+      <li>
+        Any column comments from the original table are not preserved in the 
new table. Each column in the new
+        table has a comment stating the low-level Parquet field type used to 
deduce the appropriate SQL column
+        type.
+      </li>
+
+      <li>
+        If you use a data file from a partitioned Impala table, any partition 
key columns from the original table
+        are left out of the new table, because they are represented in HDFS 
directory names rather than stored in
+        the data file. To preserve the partition information, repeat the same 
<codeph>PARTITION</codeph> clause as
+        in the original <codeph>CREATE TABLE</codeph> statement.
+      </li>
+
+      <li>
+        The file format of the new table defaults to text, as with other kinds 
of <codeph>CREATE TABLE</codeph>
+        statements. To make the new table also use Parquet format, include the 
clause <codeph>STORED AS
+        PARQUET</codeph> in the <codeph>CREATE TABLE LIKE PARQUET</codeph> 
statement.
+      </li>
+
+      <li>
+        If the Parquet data file comes from an existing Impala table, 
currently, any <codeph>TINYINT</codeph> or
+        <codeph>SMALLINT</codeph> columns are turned into <codeph>INT</codeph> 
columns in the new table.
+        Internally, Parquet stores such values as 32-bit integers.
+      </li>
+
+      <li>
+        When the destination table uses the Parquet file format, the 
<codeph>CREATE TABLE AS SELECT</codeph> and
+        <codeph>INSERT ... SELECT</codeph> statements always create at least 
one data file, even if the
+        <codeph>SELECT</codeph> part of the statement does not match any rows. 
You can use such an empty Parquet
+        data file as a template for subsequent <codeph>CREATE TABLE LIKE 
PARQUET</codeph> statements.
+      </li>
+    </ul>
+
+    <p>
+      For more details about creating Parquet tables, and examples of the 
<codeph>CREATE TABLE LIKE
+      PARQUET</codeph> syntax, see <xref href="impala_parquet.xml#parquet"/>.
+    </p>
+
+    <p>
+      <b>Visibility and Metadata (TBLPROPERTIES and WITH SERDEPROPERTIES 
clauses):</b>
+    </p>
+
+    <p rev="1.2">
+      You can associate arbitrary items of metadata with a table by specifying 
the <codeph>TBLPROPERTIES</codeph>
+      clause. This clause takes a comma-separated list of key-value pairs and 
stores those items in the metastore
+      database. You can also change the table properties later with an 
<codeph>ALTER TABLE</codeph> statement. You
+      can observe the table properties for different delimiter and escape 
characters using the <codeph>DESCRIBE
+      FORMATTED</codeph> command, and change those settings for an existing 
table with <codeph>ALTER TABLE ... SET
+      TBLPROPERTIES</codeph>.
+    </p>
+
+    <p rev="1.2">
+      You can also associate SerDes properties with the table by specifying 
key-value pairs through the
+      <codeph>WITH SERDEPROPERTIES</codeph> clause. This metadata is not used 
by Impala, which has its own built-in
+      serializer and deserializer for the file formats it supports. Particular 
property values might be needed for
+      Hive compatibility with certain variations of file formats, particularly 
Avro.
+    </p>
+
+    <p>
+      Some DDL operations that interact with other Hadoop components require 
specifying particular values in the
+      <codeph>SERDEPROPERTIES</codeph> or <codeph>TBLPROPERTIES</codeph> 
fields, such as creating an Avro table or
+      an HBase table. (You typically create HBase tables in Hive, because they 
require additional clauses not
+      currently available in Impala.)
+<!-- Haven't got a working example from Lenni, so suppressing this 
recommendation for now.
+        The Avro schema properties can be specified through either
+        <codeph>TBLPROPERTIES</codeph> or <codeph>SERDEPROPERTIES</codeph>;
+        for best compatibility with future versions of Hive,
+        use <codeph>SERDEPROPERTIES</codeph> in this case.
+-->
+    </p>
+
+    <p>
+      To see the column definitions and column comments for an existing table, 
for example before issuing a
+      <codeph>CREATE TABLE ... LIKE</codeph> or a <codeph>CREATE TABLE ... AS 
SELECT</codeph> statement, issue the
+      statement <codeph>DESCRIBE <varname>table_name</varname></codeph>. To 
see even more detail, such as the
+      location of data files and the values for clauses such as <codeph>ROW 
FORMAT</codeph> and <codeph>STORED
+      AS</codeph>, issue the statement <codeph>DESCRIBE FORMATTED 
<varname>table_name</varname></codeph>.
+      <codeph>DESCRIBE FORMATTED</codeph> is also needed to see any overall 
table comment (as opposed to individual
+      column comments).
+    </p>
+
+    <p>
+      After creating a table, your <cmdname>impala-shell</cmdname> session or 
another
+      <cmdname>impala-shell</cmdname> connected to the same node can 
immediately query that table. There might be a
+      brief interval (one statestore heartbeat) before the table can be 
queried through a different Impala node. To
+      make the <codeph>CREATE TABLE</codeph> statement return only when the 
table is recognized by all Impala nodes
+      in the cluster, enable the <codeph>SYNC_DDL</codeph> query option.
+    </p>
+
+    <p rev="1.4.0">
+      <b>HDFS caching (CACHED IN clause):</b>
+    </p>
+
+    <p rev="1.4.0">
+      If you specify the <codeph>CACHED IN</codeph> clause, any existing or 
future data files in the table
+      directory or the partition subdirectories are designated to be loaded 
into memory with the HDFS caching
+      mechanism. See <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> 
for details about using the HDFS
+      caching feature.
+    </p>
+
+    <p 
conref="../shared/impala_common.xml#common/impala_cache_replication_factor"/>
+
+<!-- Say something in here about the SHOW statement, e.g. SHOW TABLES, SHOW 
TABLE/COLUMN STATS, SHOW PARTITIONS. -->
+
+    <p>
+      <b>Column order</b>:
+    </p>
+
+    <p>
+      If you intend to use the table to hold data files produced by some 
external source, specify the columns in
+      the same order as they appear in the data files.
+    </p>
+
+    <p>
+      If you intend to insert or copy data into the table through Impala, or 
if you have control over the way
+      externally produced data files are arranged, use your judgment to 
specify columns in the most convenient
+      order:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          If certain columns are often <codeph>NULL</codeph>, specify those 
columns last. You might produce data
+          files that omit these trailing columns entirely. Impala 
automatically fills in the <codeph>NULL</codeph>
+          values if so.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          If an unpartitioned table will be used as the source for an 
<codeph>INSERT ... SELECT</codeph> operation
+          into a partitioned table, specify last in the unpartitioned table 
any columns that correspond to
+          partition key columns in the partitioned table, and in the same 
order as the partition key columns are
+          declared in the partitioned table. This technique lets you use 
<codeph>INSERT ... SELECT *</codeph> when
+          copying data to the partitioned table, rather than specifying each 
column name individually.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          If you specify columns in an order that you later discover is 
suboptimal, you can sometimes work around
+          the problem without recreating the table. You can create a view that 
selects columns from the original
+          table in a permuted order, then do a <codeph>SELECT *</codeph> from 
the view. When inserting data into a
+          table, you can specify a permuted order for the inserted columns to 
match the order in the destination
+          table.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/hive_blurb"/>
+
+    <p>
+      Impala queries can make use of metadata about the table and columns, 
such as the number of rows in a table or
+      the number of different values in a column. Prior to Impala 1.2.2, to 
create this metadata, you issued the
+      <codeph>ANALYZE TABLE</codeph> statement in Hive to gather this 
information, after creating the table and
+      loading representative data into it. In Impala 1.2.2 and higher, the 
<codeph>COMPUTE STATS</codeph> statement
+      produces these statistics within Impala, without needing to use Hive at 
all.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+
+    <note>
+      <p>
+        The Impala <codeph>CREATE TABLE</codeph> statement cannot create an 
HBase table, because it currently does
+        not support the <codeph>STORED BY</codeph> clause needed for HBase 
tables. Create such tables in Hive, then
+        query them through Impala. For information on using Impala with HBase 
tables, see
+        <xref href="impala_hbase.xml#impala_hbase"/>.
+      </p>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p rev="2.2.0">
+      To create a table where the data resides in the Amazon Simple Storage 
Service (S3),
+      specify a <codeph>s3a://</codeph> prefix <codeph>LOCATION</codeph> 
attribute pointing to the data files in S3.
+      You can use this special <codeph>LOCATION</codeph> syntax when creating 
an empty table,
+      but not as part of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+      See <xref href="impala_s3.xml#s3"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      The <codeph>CREATE TABLE</codeph> statement for an internal table 
creates a directory in HDFS. The
+      <codeph>CREATE EXTERNAL TABLE</codeph> statement associates the table 
with an existing HDFS directory, and
+      does not create any new directory in HDFS. To locate the HDFS data 
directory for a table, issue a
+      <codeph>DESCRIBE FORMATTED <varname>table</varname></codeph> statement. 
To examine the contents of that HDFS
+      directory, use an OS command such as <codeph>hdfs dfs -ls 
hdfs://<varname>path</varname></codeph>, either
+      from the OS command line or through the <codeph>shell</codeph> or 
<codeph>!</codeph> commands in
+      <cmdname>impala-shell</cmdname>.
+    </p>
+
+    <p>
+      The <codeph>CREATE TABLE AS SELECT</codeph> syntax creates data files 
under the table data directory to hold
+      any data copied by the <codeph>INSERT</codeph> portion of the statement. 
(Even if no data is copied, Impala
+      might create one or more empty data files.)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+    <!-- TBD. -->
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/security_blurb"/>
+    <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_maybe"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_tables.xml#tables"/>,
+      <xref href="impala_alter_table.xml#alter_table"/>, <xref 
href="impala_drop_table.xml#drop_table"/>,
+      <xref href="impala_partitioning.xml#partitioning"/>, <xref 
href="impala_tables.xml#internal_tables"/>,
+      <xref href="impala_tables.xml#external_tables"/>, <xref 
href="impala_compute_stats.xml#compute_stats"/>,
+      <xref href="impala_sync_ddl.xml#sync_ddl"/>, <xref 
href="impala_show.xml#show_tables"/>,
+      <xref href="impala_show.xml#show_create_table"/>, <xref 
href="impala_describe.xml#describe"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_view.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_view.xml 
b/docs/topics/impala_create_view.xml
new file mode 100644
index 0000000..2458279
--- /dev/null
+++ b/docs/topics/impala_create_view.xml
@@ -0,0 +1,136 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="create_view">
+
+  <title>CREATE VIEW Statement</title>
+  <titlealts><navtitle>CREATE VIEW</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE VIEW statement</indexterm>
+      The <codeph>CREATE VIEW</codeph> statement lets you create a shorthand 
abbreviation for a more complicated
+      query. The base query can involve joins, expressions, reordered columns, 
column aliases, and other SQL
+      features that can make a query hard to understand or maintain.
+    </p>
+
+    <p>
+      Because a view is purely a logical construct (an alias for a query) with 
no physical data behind it,
+      <codeph>ALTER VIEW</codeph> only involves changes to metadata in the 
metastore database, not any data files
+      in HDFS.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE VIEW [IF NOT EXISTS] <varname>view_name</varname> 
[(<varname>column_list</varname>)]
+  AS <varname>select_statement</varname></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      The <codeph>CREATE VIEW</codeph> statement can be useful in scenarios 
such as the following:
+    </p>
+
+    <ul>
+      <li>
+        To turn even the most lengthy and complicated SQL query into a 
one-liner. You can issue simple queries
+        against the view from applications, scripts, or interactive queries in 
<cmdname>impala-shell</cmdname>.
+        For example:
+<codeblock>select * from <varname>view_name</varname>;
+select * from <varname>view_name</varname> order by c1 desc limit 
10;</codeblock>
+        The more complicated and hard-to-read the original query, the more 
benefit there is to simplifying the
+        query using a view.
+      </li>
+
+      <li>
+        To hide the underlying table and column names, to minimize maintenance 
problems if those names change. In
+        that case, you re-create the view using the new names, and all queries 
that use the view rather than the
+        underlying tables keep running with no changes.
+      </li>
+
+      <li>
+        To experiment with optimization techniques and make the optimized 
queries available to all applications.
+        For example, if you find a combination of <codeph>WHERE</codeph> 
conditions, join order, join hints, and so
+        on that works the best for a class of queries, you can establish a 
view that incorporates the
+        best-performing techniques. Applications can then make relatively 
simple queries against the view, without
+        repeating the complicated and optimized logic over and over. If you 
later find a better way to optimize the
+        original query, when you re-create the view, all the applications 
immediately take advantage of the
+        optimized base query.
+      </li>
+
+      <li>
+        To simplify a whole class of related queries, especially complicated 
queries involving joins between
+        multiple tables, complicated expressions in the column list, and other 
SQL syntax that makes the query
+        difficult to understand and debug. For example, you might create a 
view that joins several tables, filters
+        using several <codeph>WHERE</codeph> conditions, and selects several 
columns from the result set.
+        Applications might issue queries against this view that only vary in 
their <codeph>LIMIT</codeph>,
+        <codeph>ORDER BY</codeph>, and similar simple clauses.
+      </li>
+    </ul>
+
+    <p>
+      For queries that require repeating complicated clauses over and over 
again, for example in the select list,
+      <codeph>ORDER BY</codeph>, and <codeph>GROUP BY</codeph> clauses, you 
can use the <codeph>WITH</codeph>
+      clause as an alternative to creating a view.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+    <p conref="../shared/impala_common.xml#common/complex_types_views"/>
+    <p conref="../shared/impala_common.xml#common/complex_types_views_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/security_blurb"/>
+    <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<!-- TK: Elaborate on these, show queries and real output. -->
+
+<codeblock>-- Create a view that is exactly the same as the underlying table.
+create view v1 as select * from t1;
+
+-- Create a view that includes only certain columns from the underlying table.
+create view v2 as select c1, c3, c7 from t1;
+
+-- Create a view that filters the values from the underlying table.
+create view v3 as select distinct c1, c3, c7 from t1 where c1 is not null and 
c5 &gt; 0;
+
+-- Create a view that that reorders and renames columns from the underlying 
table.
+create view v4 as select c4 as last_name, c6 as address, c2 as birth_date from 
t1;
+
+-- Create a view that runs functions to convert or transform certain columns.
+create view v5 as select c1, cast(c3 as string) c3, concat(c4,c5) c5, trim(c6) 
c6, "Constant" c8 from t1;
+
+-- Create a view that hides the complexity of a view query.
+create view v6 as select t1.c1, t2.c2 from t1 join t2 on t1.id = t2.id;
+</codeblock>
+
+<!-- These examples show CREATE VIEW and corresponding DROP VIEW statements, 
with different combinations
+     of qualified and unqualified names. -->
+
+    <p conref="../shared/impala_common.xml#common/create_drop_view_examples"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_views.xml#views"/>, <xref 
href="impala_alter_view.xml#alter_view"/>,
+      <xref href="impala_drop_view.xml#drop_view"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_databases.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_databases.xml b/docs/topics/impala_databases.xml
new file mode 100644
index 0000000..ad0511f
--- /dev/null
+++ b/docs/topics/impala_databases.xml
@@ -0,0 +1,65 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="databases">
+
+  <title>Overview of Impala Databases</title>
+  <titlealts><navtitle>Databases</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      In Impala, a database is a logical container for a group of tables. Each 
database defines a separate
+      namespace. Within a database, you can refer to the tables inside it 
using their unqualified names. Different
+      databases can contain tables with identical names.
+    </p>
+
+    <p>
+      Creating a database is a lightweight operation. There are minimal 
database-specific properties to configure,
+      only <codeph>LOCATION</codeph> and <codeph>COMMENT</codeph>.  There is 
no <codeph>ALTER DATABASE</codeph> statement.
+    </p>
+
+    <p>
+      Typically, you create a separate database for each project or 
application, to avoid naming conflicts between
+      tables and to make clear which tables are related to each other. The 
<codeph>USE</codeph> statement lets
+      you switch between databases. Unqualified references to tables, views, 
and functions refer to objects
+      within the current database. You can also refer to objects in other 
databases by using qualified names
+      of the form 
<codeph><varname>dbname</varname>.<varname>object_name</varname></codeph>.
+    </p>
+
+    <p>
+      Each database is physically represented by a directory in HDFS. When you 
do not specify a <codeph>LOCATION</codeph>
+      attribute, the directory is located in the Impala data directory with 
the associated tables managed by Impala.
+      When you do specify a <codeph>LOCATION</codeph> attribute, any read and 
write operations for tables in that
+      database are relative to the specified HDFS directory.
+    </p>
+
+    <p>
+      There is a special database, named <codeph>default</codeph>, where you 
begin when you connect to Impala.
+      Tables created in <codeph>default</codeph> are physically located one 
level higher in HDFS than all the
+      user-created databases.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/builtins_db"/>
+
+    <p>
+      <b>Related statements:</b>
+    </p>
+
+    <p>
+      <xref href="impala_create_database.xml#create_database"/>,
+      <xref href="impala_drop_database.xml#drop_database"/>, <xref 
href="impala_use.xml#use"/>,
+      <xref href="impala_show.xml#show_databases"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_datatypes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_datatypes.xml b/docs/topics/impala_datatypes.xml
new file mode 100644
index 0000000..e45867e
--- /dev/null
+++ b/docs/topics/impala_datatypes.xml
@@ -0,0 +1,43 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="datatypes">
+
+  <title>Data Types</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">data types</indexterm>
+      Impala supports a set of data types that you can use for table columns, 
expression values, and function
+      arguments and return values.
+    </p>
+
+    <note>
+      Currently, Impala supports only scalar types, not composite or nested 
types. Accessing a table containing any
+      columns with unsupported types causes an error.
+    </note>
+
+    <p outputclass="toc"/>
+
+    <p>
+      For the notation to write literals of each of these data types, see
+      <xref href="impala_literals.xml#literals"/>.
+    </p>
+
+    <p>
+      See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> 
for differences between Impala and
+      Hive data types.
+    </p>
+  </conbody>
+</concept>

[15/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

Reply via email to