[06/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

jrussell Tue, 26 Jul 2016 16:06:13 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_porting.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_porting.xml b/docs/topics/impala_porting.xml
new file mode 100644
index 0000000..c9c8e52
--- /dev/null
+++ b/docs/topics/impala_porting.xml
@@ -0,0 +1,622 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="porting">
+
+  <title>Porting SQL from Other Database Systems to Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Oracle"/>
+      <data name="Category" value="MySQL"/>
+      <data name="Category" value="PostgreSQL"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Porting"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">porting</indexterm>
+      Although Impala uses standard SQL for queries, you might need to modify 
SQL source when bringing applications
+      to Impala, due to variations in data types, built-in functions, vendor 
language extensions, and
+      Hadoop-specific syntax. Even when SQL is working correctly, you might 
make further minor modifications for
+      best performance.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="porting_ddl_dml">
+
+    <title>Porting DDL and DML Statements</title>
+
+    <conbody>
+
+      <p>
+        When adapting SQL code from a traditional database system to Impala, 
expect to find a number of differences
+        in the DDL statements that you use to set up the schema. Clauses 
related to physical layout of files,
+        tablespaces, and indexes have no equivalent in Impala. You might 
restructure your schema considerably to
+        account for the Impala partitioning scheme and Hadoop file formats.
+      </p>
+
+      <p>
+        Expect SQL queries to have a much higher degree of compatibility. With 
modest rewriting to address vendor
+        extensions and features not yet supported in Impala, you might be able 
to run identical or almost-identical
+        query text on both systems.
+      </p>
+
+      <p>
+        Therefore, consider separating out the DDL into a separate 
Impala-specific setup script. Focus your reuse
+        and ongoing tuning efforts on the code for SQL queries.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="porting_data_types">
+
+    <title>Porting Data Types from Other Database Systems</title>
+
+    <conbody>
+
+      <ul>
+        <li>
+          <p>
+            Change any <codeph>VARCHAR</codeph>, <codeph>VARCHAR2</codeph>, 
and <codeph>CHAR</codeph> columns to
+            <codeph>STRING</codeph>. Remove any length constraints from the 
column declarations; for example,
+            change <codeph>VARCHAR(32)</codeph> or <codeph>CHAR(1)</codeph> to 
<codeph>STRING</codeph>. Impala is
+            very flexible about the length of string values; it does not 
impose any length constraints
+            or do any special processing (such as blank-padding) for 
<codeph>STRING</codeph> columns.
+            (In Impala 2.0 and higher, there are data types 
<codeph>VARCHAR</codeph> and <codeph>CHAR</codeph>,
+            with length constraints for both types and blank-padding for 
<codeph>CHAR</codeph>.
+            However, for performance reasons, it is still preferable to use 
<codeph>STRING</codeph>
+            columns where practical.)
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For national language character types such as 
<codeph>NCHAR</codeph>, <codeph>NVARCHAR</codeph>, or
+            <codeph>NCLOB</codeph>, be aware that while Impala can store and 
query UTF-8 character data, currently
+            some string manipulation operations only work correctly with ASCII 
data. See
+            <xref href="impala_string.xml#string"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Change any <codeph>DATE</codeph>, <codeph>DATETIME</codeph>, or 
<codeph>TIME</codeph> columns to
+            <codeph>TIMESTAMP</codeph>. Remove any precision constraints. 
Remove any timezone clauses, and make
+            sure your application logic or ETL process accounts for the fact 
that Impala expects all
+            <codeph>TIMESTAMP</codeph> values to be in
+            <xref 
href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time"; scope="external" 
format="html">Coordinated
+            Universal Time (UTC)</xref>. See <xref 
href="impala_timestamp.xml#timestamp"/> for information about
+            the <codeph>TIMESTAMP</codeph> data type, and
+            <xref href="impala_datetime_functions.xml#datetime_functions"/> 
for conversion functions for different
+            date and time formats.
+          </p>
+          <p>
+            You might also need to adapt date- and time-related literal values 
and format strings to use the
+            supported Impala date and time formats. If you have date and time 
literals with different separators or
+            different numbers of <codeph>YY</codeph>, <codeph>MM</codeph>, and 
so on placeholders than Impala
+            expects, consider using calls to <codeph>regexp_replace()</codeph> 
to transform those values to the
+            Impala-compatible format. See <xref 
href="impala_timestamp.xml#timestamp"/> for information about the
+            allowed formats for date and time literals, and
+            <xref href="impala_string_functions.xml#string_functions"/> for 
string conversion functions such as
+            <codeph>regexp_replace()</codeph>.
+          </p>
+          <p>
+            Instead of <codeph>SYSDATE</codeph>, call the function 
<codeph>NOW()</codeph>.
+          </p>
+          <p>
+            Instead of adding or subtracting directly from a date value to 
produce a value <varname>N</varname>
+            days in the past or future, use an <codeph>INTERVAL</codeph> 
expression, for example <codeph>NOW() +
+            INTERVAL 30 DAYS</codeph>.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Although Impala supports <codeph>INTERVAL</codeph> expressions for 
datetime arithmetic, as shown in
+            <xref href="impala_timestamp.xml#timestamp"/>, 
<codeph>INTERVAL</codeph> is not available as a column
+            data type in Impala. For any <codeph>INTERVAL</codeph> values 
stored in tables, convert them to numeric
+            values that you can add or subtract using the functions in
+            <xref href="impala_datetime_functions.xml#datetime_functions"/>. 
For example, if you had a table
+            <codeph>DEADLINES</codeph> with an <codeph>INT</codeph> column 
<codeph>TIME_PERIOD</codeph>, you could
+            construct dates N days in the future like so:
+          </p>
+<codeblock>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</codeblock>
+        </li>
+
+        <li>
+          <p>
+            For <codeph>YEAR</codeph> columns, change to the smallest Impala 
integer type that has sufficient
+            range. See <xref href="impala_datatypes.xml#datatypes"/> for 
details about ranges, casting, and so on
+            for the various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Change any <codeph>DECIMAL</codeph> and <codeph>NUMBER</codeph> 
types. If fixed-point precision is not
+            required, you can use <codeph>FLOAT</codeph> or 
<codeph>DOUBLE</codeph> on the Impala side depending on
+            the range of values. For applications that require precise decimal 
values, such as financial data, you
+            might need to make more extensive changes to table structure and 
application logic, such as using
+            separate integer columns for dollars and cents, or encoding 
numbers as string values and writing UDFs
+            to manipulate them. See <xref 
href="impala_datatypes.xml#datatypes"/> for details about ranges,
+            casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            <codeph>FLOAT</codeph>, <codeph>DOUBLE</codeph>, and 
<codeph>REAL</codeph> types are supported in
+            Impala. Remove any precision and scale specifications. (In Impala, 
<codeph>REAL</codeph> is just an
+            alias for <codeph>DOUBLE</codeph>; columns declared as 
<codeph>REAL</codeph> are turned into
+            <codeph>DOUBLE</codeph> behind the scenes.) See <xref 
href="impala_datatypes.xml#datatypes"/> for
+            details about ranges, casting, and so on for the various numeric 
data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Most integer types from other systems have equivalents in Impala, 
perhaps under different names such as
+            <codeph>BIGINT</codeph> instead of <codeph>INT8</codeph>. For any 
that are unavailable, for example
+            <codeph>MEDIUMINT</codeph>, switch to the smallest Impala integer 
type that has sufficient range.
+            Remove any precision specifications. See <xref 
href="impala_datatypes.xml#datatypes"/> for details
+            about ranges, casting, and so on for the various numeric data 
types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Remove any <codeph>UNSIGNED</codeph> constraints. All Impala 
numeric types are signed. See
+            <xref href="impala_datatypes.xml#datatypes"/> for details about 
ranges, casting, and so on for the
+            various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For any types holding bitwise values, use an integer type with 
enough range to hold all the relevant
+            bits within a positive integer. See <xref 
href="impala_datatypes.xml#datatypes"/> for details about
+            ranges, casting, and so on for the various numeric data types.
+          </p>
+          <p>
+            For example, <codeph>TINYINT</codeph> has a maximum positive value 
of 127, not 256, so to manipulate
+            8-bit bitfields as positive numbers switch to the next largest 
type <codeph>SMALLINT</codeph>.
+          </p>
+<codeblock>[localhost:21000] &gt; select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2                       |
++--------------------------+
+[localhost:21000] &gt; select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128                 |
++----------------------+
+[localhost:21000] &gt; select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254                       |
++---------------------------+</codeblock>
+          <p>
+            Impala does not support notation such as <codeph>b'0101'</codeph> 
for bit literals.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For BLOB values, use <codeph>STRING</codeph> to represent 
<codeph>CLOB</codeph> or
+            <codeph>TEXT</codeph> types (character based large objects) up to 
32 KB in size. Binary large objects
+            such as <codeph>BLOB</codeph>, <codeph>RAW</codeph> 
<codeph>BINARY</codeph>, and
+            <codeph>VARBINARY</codeph> do not currently have an equivalent in 
Impala.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For Boolean-like types such as <codeph>BOOL</codeph>, use the 
Impala <codeph>BOOLEAN</codeph> type.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Because Impala currently does not support composite or nested 
types, any spatial data types in other
+            database systems do not have direct equivalents in Impala. You 
could represent spatial values in string
+            format and write UDFs to process them. See <xref 
href="impala_udf.xml#udfs"/> for details. Where
+            practical, separate spatial types into separate tables so that 
Impala can still work with the
+            non-spatial data.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Take out any <codeph>DEFAULT</codeph> clauses. Impala can use data 
files produced from many different
+            sources, such as Pig, Hive, or MapReduce jobs. The fast import 
mechanisms of <codeph>LOAD DATA</codeph>
+            and external tables mean that Impala is flexible about the format 
of data files, and Impala does not
+            necessarily validate or cleanse data before querying it. When 
copying data through Impala
+            <codeph>INSERT</codeph> statements, you can use conditional 
functions such as <codeph>CASE</codeph> or
+            <codeph>NVL</codeph> to substitute some other value for 
<codeph>NULL</codeph> fields; see
+            <xref 
href="impala_conditional_functions.xml#conditional_functions"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Take out any constraints from your <codeph>CREATE TABLE</codeph> 
and <codeph>ALTER TABLE</codeph>
+            statements, for example <codeph>PRIMARY KEY</codeph>, 
<codeph>FOREIGN KEY</codeph>,
+            <codeph>UNIQUE</codeph>, <codeph>NOT NULL</codeph>, 
<codeph>UNSIGNED</codeph>, or
+            <codeph>CHECK</codeph> constraints. Impala can use data files 
produced from many different sources,
+            such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects 
initial data validation to happen
+            earlier during the ETL or ELT cycle. After data is loaded into 
Impala tables, you can perform queries
+            to test for <codeph>NULL</codeph> values. When copying data 
through Impala <codeph>INSERT</codeph>
+            statements, you can use conditional functions such as 
<codeph>CASE</codeph> or <codeph>NVL</codeph> to
+            substitute some other value for <codeph>NULL</codeph> fields; see
+            <xref 
href="impala_conditional_functions.xml#conditional_functions"/> for details.
+          </p>
+          <p>
+            Do as much verification as practical before loading data into 
Impala. After data is loaded into Impala,
+            you can do further verification using SQL queries to check if 
values have expected ranges, if values
+            are <codeph>NULL</codeph> or not, and so on. If there is a problem 
with the data, you will need to
+            re-run earlier stages of the ETL process, or do an <codeph>INSERT 
... SELECT</codeph> statement in
+            Impala to copy the faulty data to a new table and transform or 
filter out the bad values.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Take out any <codeph>CREATE INDEX</codeph>, <codeph>DROP 
INDEX</codeph>, and <codeph>ALTER
+            INDEX</codeph> statements, and equivalent <codeph>ALTER 
TABLE</codeph> statements. Remove any
+            <codeph>INDEX</codeph>, <codeph>KEY</codeph>, or <codeph>PRIMARY 
KEY</codeph> clauses from
+            <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> 
statements. Impala is optimized for bulk
+            read operations for data warehouse-style queries, and therefore 
does not support indexes for its
+            tables.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Calls to built-in functions with out-of-range or otherwise 
incorrect arguments, return
+            <codeph>NULL</codeph> in Impala as opposed to raising exceptions. 
(This rule applies even when the
+            <codeph>ABORT_ON_ERROR=true</codeph> query option is in effect.) 
Run small-scale queries using
+            representative data to doublecheck that calls to built-in 
functions are returning expected values
+            rather than <codeph>NULL</codeph>. For example, unsupported 
<codeph>CAST</codeph> operations do not
+            raise an error in Impala:
+          </p>
+<codeblock>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL               |
++--------------------+</codeblock>
+        </li>
+
+        <li>
+          <p>
+            For any other type not supported in Impala, you could represent 
their values in string format and write
+            UDFs to process them. See <xref href="impala_udf.xml#udfs"/> for 
details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            To detect the presence of unsupported or unconvertable data types 
in data files, do initial testing
+            with the <codeph>ABORT_ON_ERROR=true</codeph> query option in 
effect. This option causes queries to
+            fail immediately if they encounter disallowed type conversions. See
+            <xref href="impala_abort_on_error.xml#abort_on_error"/> for 
details. For example:
+          </p>
+<codeblock>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</codeblock>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="porting_statements">
+
+    <title>SQL Statements to Remove or Adapt</title>
+
+    <conbody>
+
+      <p>
+        Some SQL statements or clauses that you might be familiar with are not 
currently supported in Impala:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Impala has no <codeph>DELETE</codeph> statement. Impala is 
intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of 
data. Instead of using
+            <codeph>DELETE</codeph>, use <codeph>INSERT OVERWRITE</codeph> to 
entirely replace the contents of a
+            table or partition, or use <codeph>INSERT ... SELECT</codeph> to 
copy a subset of data (everything but
+            the rows you intended to delete) from one table to another. See 
<xref href="impala_dml.xml#dml"/> for
+            an overview of Impala DML statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala has no <codeph>UPDATE</codeph> statement. Impala is 
intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of 
data. Instead of using
+            <codeph>UPDATE</codeph>, do all necessary transformations early in 
the ETL process, such as in the job
+            that generates the original data, or when copying from one table 
to another to convert to a particular
+            file format or partitioning scheme. See <xref 
href="impala_dml.xml#dml"/> for an overview of Impala DML
+            statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala has no transactional statements, such as 
<codeph>COMMIT</codeph> or <codeph>ROLLBACK</codeph>.
+            Impala effectively works like the <codeph>AUTOCOMMIT</codeph> mode 
in some database systems, where
+            changes take effect as soon as they are made.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If your database, table, column, or other names conflict with 
Impala reserved words, use different
+            names or quote the names with backticks. See <xref 
href="impala_reserved_words.xml#reserved_words"/>
+            for the current list of Impala reserved words.
+          </p>
+          <p>
+            Conversely, if you use a keyword that Impala does not recognize, 
it might be interpreted as a table or
+            column alias. For example, in <codeph>SELECT * FROM t1 NATURAL 
JOIN t2</codeph>, Impala does not
+            recognize the <codeph>NATURAL</codeph> keyword and interprets it 
as an alias for the table
+            <codeph>t1</codeph>. If you experience any unexpected behavior 
with queries, check the list of reserved
+            words to make sure all keywords in join and <codeph>WHERE</codeph> 
clauses are recognized.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala supports subqueries only in the <codeph>FROM</codeph> 
clause of a query, not within the
+            <codeph>WHERE</codeph> clauses. Therefore, you cannot use clauses 
such as <codeph>WHERE
+            <varname>column</varname> IN 
(<varname>subquery</varname>)</codeph>. Also, Impala does not allow
+            <codeph>EXISTS</codeph> or <codeph>NOT EXISTS</codeph> clauses 
(although <codeph>EXISTS</codeph> is a
+            reserved keyword).
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala supports <codeph>UNION</codeph> and <codeph>UNION 
ALL</codeph> set operators, but not
+            <codeph>INTERSECT</codeph>. <ph 
conref="../shared/impala_common.xml#common/union_all_vs_union"/>
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Within queries, Impala requires query aliases for any subqueries:
+          </p>
+<codeblock>-- Without the alias 'contents_of_t1' at the end, query gives 
syntax error.
+select count(*) from (select * from t1) contents_of_t1;</codeblock>
+        </li>
+
+        <li>
+          <p>
+            When an alias is declared for an expression in a query, that alias 
cannot be referenced again within
+            the same query block:
+          </p>
+<codeblock>-- Can't reference AVERAGE twice in the SELECT list where it's 
defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average &gt; 3;</codeblock>
+          <p>
+            For Impala, either repeat the expression again, or abstract the 
expression into a <codeph>WITH</codeph>
+            clause, creating named columns that can be referenced multiple 
times anywhere in the base query:
+          </p>
+<codeblock>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, 
average+1 from avg_t;</codeblock>
+<!-- An alternative bunch of queries to use in the example above.
+[localhost:21000] > select x*x as x_squared from t1;
+
+[localhost:21000] > select x*x as x_squared from t1 where x_squared < 100;
+ERROR: AnalysisException: couldn't resolve column reference: 'x_squared'
+[localhost:21000] > select x*x as x_squared, x_squared * pi() as pi_x_squared 
from t1;
+ERROR: AnalysisException: couldn't resolve column reference: 'x_squared'
+[localhost:21000] > select x*x as x_squared from t1 group by x_squared;
+
+[localhost:21000] > select x*x as x_squared from t1 group by x_squared having 
x_squared < 100;
+-->
+        </li>
+
+        <li>
+          <p>
+            Impala does not support certain rarely used join types that are 
less appropriate for high-volume tables
+            used for data warehousing. In some cases, Impala supports join 
types but requires explicit syntax to
+            ensure you do not do inefficient joins of huge tables by accident. 
For example, Impala does not support
+            natural joins or anti-joins, and requires the <codeph>CROSS 
JOIN</codeph> operator for Cartesian
+            products. See <xref href="impala_joins.xml#joins"/> for details on 
the syntax for Impala join clauses.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala has a limited choice of partitioning types. Partitions are 
defined based on each distinct
+            combination of values for one or more partition key columns. 
Impala does not redistribute or check data
+            to create evenly distributed partitions; you must choose partition 
key columns based on your knowledge
+            of the data volume and distribution. Adapt any tables that use 
range, list, hash, or key partitioning
+            to use the Impala partition syntax for <codeph>CREATE 
TABLE</codeph> and <codeph>ALTER TABLE</codeph>
+            statements. Impala partitioning is similar to range partitioning 
where every range has exactly one
+            value, or key partitioning where the hash function produces a 
separate bucket for every combination of
+            key values. See <xref 
href="impala_partitioning.xml#partitioning"/> for usage details, and
+            <xref href="impala_create_table.xml#create_table"/> and
+            <xref href="impala_alter_table.xml#alter_table"/> for syntax.
+          </p>
+          <note>
+            Because the number of separate partitions is potentially higher 
than in other database systems, keep a
+            close eye on the number of partitions and the volume of data in 
each one; scale back the number of
+            partition key columns if you end up with too many partitions with 
a small volume of data in each one.
+            Remember, to distribute work for a query across a cluster, you 
need at least one HDFS block per node.
+            HDFS blocks are typically multiple megabytes, <ph 
rev="parquet_block_size">especially</ph> for Parquet
+            files. Therefore, if each partition holds only a few megabytes of 
data, you are unlikely to see much
+            parallelism in the query because such a small amount of data is 
typically processed by a single node.
+          </note>
+        </li>
+
+        <li>
+          <p>
+            For <q>top-N</q> queries, Impala uses the <codeph>LIMIT</codeph> 
clause rather than comparing against a
+            pseudocolumn named <codeph>ROWNUM</codeph> or 
<codeph>ROW_NUM</codeph>. See
+            <xref href="impala_limit.xml#limit"/> for details.
+          </p>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="porting_antipatterns">
+
+    <title>SQL Constructs to Doublecheck</title>
+
+    <conbody>
+
+      <p>
+        Some SQL constructs that are supported have behavior or defaults more 
oriented towards convenience than
+        optimal performance. Also, sometimes machine-generated SQL, perhaps 
issued through JDBC or ODBC
+        applications, might have inefficiencies or exceed internal Impala 
limits. As you port SQL code, be alert
+        and change these things where appropriate:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            A <codeph>CREATE TABLE</codeph> statement with no <codeph>STORED 
AS</codeph> clause creates data files
+            in plain text format, which is convenient for data interchange but 
not a good choice for high-volume
+            data with high-performance queries. See <xref 
href="impala_file_formats.xml#file_formats"/> for why and
+            how to use specific file formats for compact data and 
high-performance queries. Especially see
+            <xref href="impala_parquet.xml#parquet"/>, for details about the 
file format most heavily optimized for
+            large-scale data warehouse queries.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            A <codeph>CREATE TABLE</codeph> statement with no 
<codeph>PARTITIONED BY</codeph> clause stores all the
+            data files in the same physical location, which can lead to 
scalability problems when the data volume
+            becomes large.
+          </p>
+          <p>
+            On the other hand, adapting tables that were already partitioned 
in a different database system could
+            produce an Impala table with a high number of partitions and not 
enough data in each one, leading to
+            underutilization of Impala's parallel query features.
+          </p>
+          <p>
+            See <xref href="impala_partitioning.xml#partitioning"/> for 
details about setting up partitioning and
+            tuning the performance of queries on partitioned tables.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The <codeph>INSERT ... VALUES</codeph> syntax is suitable for 
setting up toy tables with a few rows for
+            functional testing, but because each such statement creates a 
separate tiny file in HDFS, it is not a
+            scalable technique for loading megabytes or gigabytes (let alone 
petabytes) of data. Consider revising
+            your data load process to produce raw data files outside of 
Impala, then setting up Impala external
+            tables or using the <codeph>LOAD DATA</codeph> statement to use 
those data files instantly in Impala
+            tables, with no conversion or indexing stage. See <xref 
href="impala_tables.xml#external_tables"/> and
+            <xref href="impala_load_data.xml#load_data"/> for details about 
the Impala techniques for working with
+            data files produced outside of Impala; see <xref 
href="impala_tutorial.xml#tutorial_etl"/> for examples
+            of ETL workflow for Impala.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If your ETL process is not optimized for Hadoop, you might end up 
with highly fragmented small data
+            files, or a single giant data file that cannot take advantage of 
distributed parallel queries or
+            partitioning. In this case, use an <codeph>INSERT ... 
SELECT</codeph> statement to copy the data into a
+            new table and reorganize into a more efficient layout in the same 
operation. See
+            <xref href="impala_insert.xml#insert"/> for details about the 
<codeph>INSERT</codeph> statement.
+          </p>
+          <p>
+            You can do <codeph>INSERT ... SELECT</codeph> into a table with a 
more efficient file format (see
+            <xref href="impala_file_formats.xml#file_formats"/>) or from an 
unpartitioned table into a partitioned
+            one (see <xref href="impala_partitioning.xml#partitioning"/>).
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The number of expressions allowed in an Impala query might be 
smaller than for some other database
+            systems, causing failures for very complicated queries (typically 
produced by automated SQL
+            generators). Where practical, keep the number of expressions in 
the <codeph>WHERE</codeph> clauses to
+            approximately 2000 or fewer. As a workaround, set the query option
+            <codeph>DISABLE_CODEGEN=true</codeph> if queries fail for this 
reason. See
+            <xref href="impala_disable_codegen.xml#disable_codegen"/> for 
details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If practical, rewrite <codeph>UNION</codeph> queries to use the 
<codeph>UNION ALL</codeph> operator
+            instead. <ph 
conref="../shared/impala_common.xml#common/union_all_vs_union"/>
+          </p>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="porting_next">
+
+    <title>Next Porting Steps after Verifying Syntax and Semantics</title>
+
+    <conbody>
+
+      <p>
+        Throughout this section, some of the decisions you make during the 
porting process also have a substantial
+        impact on performance. After your SQL code is ported and working 
correctly, doublecheck the
+        performance-related aspects of your schema design, physical layout, 
and queries to make sure that the
+        ported application is taking full advantage of Impala's parallelism, 
performance-related SQL features, and
+        integration with Hadoop components.
+      </p>
+
+      <ul>
+        <li>
+          Have you run the <codeph>COMPUTE STATS</codeph> statement on each 
table involved in join queries? Have
+          you also run <codeph>COMPUTE STATS</codeph> for each table used as 
the source table in an <codeph>INSERT
+          ... SELECT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph> 
statement?
+        </li>
+
+        <li>
+          Are you using the most efficient file format for your data volumes, 
table structure, and query
+          characteristics?
+        </li>
+
+        <li>
+          Are you using partitioning effectively? That is, have you 
partitioned on columns that are often used for
+          filtering in <codeph>WHERE</codeph> clauses? Have you partitioned at 
the right granularity so that there
+          is enough data in each partition to parallelize the work for each 
query?
+        </li>
+
+        <li>
+          Does your ETL process produce a relatively small number of 
multi-megabyte data files (good) rather than a
+          huge number of small files (bad)?
+        </li>
+      </ul>
+
+      <p>
+        See <xref href="impala_performance.xml#performance"/> for details 
about the whole performance tuning
+        process.
+      </p>
+    </conbody>
+  </concept>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_query_options.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_options.xml 
b/docs/topics/impala_query_options.xml
new file mode 100644
index 0000000..1011746
--- /dev/null
+++ b/docs/topics/impala_query_options.xml
@@ -0,0 +1,75 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="query_options">
+
+  <title>Query Options for the SET Statement</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      You can specify the following options using the <codeph>SET</codeph> 
statement, and those settings affect all
+      queries issued from that session.
+    </p>
+
+    <p>
+      Some query options are useful in day-to-day operations for improving 
usability, performance, or flexibility.
+    </p>
+
+    <p>
+      Other query options control special-purpose aspects of Impala operation 
and are intended primarily for
+      advanced debugging or troubleshooting.
+    </p>
+
+    <p>
+      Options with Boolean parameters can be set to 1 or <codeph>true</codeph> 
to enable, or 0 or <codeph>false</codeph>
+      to turn off.
+    </p>
+
+    <note rev="2.0.0">
+      In Impala 2.0 and later, you can set query options directly through the 
JDBC and ODBC interfaces by using the
+      <codeph>SET</codeph> statement. Formerly, <codeph>SET</codeph> was only 
available as a command within the
+      <cmdname>impala-shell</cmdname> interpreter.
+    </note>
+
+<!-- This is the list including defaults from the pre-release 1.2 impala-shell:
+       ABORT_ON_DEFAULT_LIMIT_EXCEEDED: 0
+       ABORT_ON_ERROR: 0
+       ALLOW_UNSUPPORTED_FORMATS: 0
+       BATCH_SIZE: 0
+       DEBUG_ACTION:
+       DEFAULT_ORDER_BY_LIMIT: -1
+       DISABLE_CODEGEN: 0
+       HBASE_CACHE_BLOCKS: 0
+       HBASE_CACHING: 0
+       MAX_ERRORS: 0
+       MAX_IO_BUFFERS: 0
+       MAX_SCAN_RANGE_LENGTH: 0
+       MEM_LIMIT: 0
+       NUM_NODES: 0
+       NUM_SCANNER_THREADS: 0
+       PARQUET_COMPRESSION_CODEC: SNAPPY
+       PARQUET_FILE_SIZE: 0
+       SUPPORT_START_OVER: false
+-->
+
+    <p outputclass="toc"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_set.xml#set"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_query_timeout_s.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_timeout_s.xml 
b/docs/topics/impala_query_timeout_s.xml
new file mode 100644
index 0000000..41f2918
--- /dev/null
+++ b/docs/topics/impala_query_timeout_s.xml
@@ -0,0 +1,51 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="query_timeout_s">
+
+  <title>QUERY_TIMEOUT_S Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">QUERY_TIMEOUT_S query option</indexterm>
+      Sets the idle query timeout value for the session, in seconds. Queries 
that sit idle for longer than the
+      timeout value are automatically cancelled. If the system administrator 
specified the
+      <codeph>--idle_query_timeout</codeph> startup option, 
<codeph>QUERY_TIMEOUT_S</codeph> must be smaller than
+      or equal to the <codeph>--idle_query_timeout</codeph> value.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/timeout_clock_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET QUERY_TIMEOUT_S=<varname>seconds</varname>;</codeblock>
+
+<!-- Don't have a compelling example to show at this time because the 'idle' 
aspect only applies
+     when the client is careless and leaves the query open. Can't easily 
demonstrate in impala-shell.
+
+     <p conref="/Content/impala_common_xi44078.xml#common/example_blurb"/>
+-->
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 0 (no timeout if <codeph>--idle_query_timeout</codeph> 
not in effect; otherwise, use
+      <codeph>--idle_query_timeout</codeph> value)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_timeouts.xml#timeouts"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_real.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_real.xml b/docs/topics/impala_real.xml
new file mode 100644
index 0000000..e6430e3
--- /dev/null
+++ b/docs/topics/impala_real.xml
@@ -0,0 +1,46 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="real">
+
+  <title>REAL Data Type</title>
+  <titlealts><navtitle>REAL</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      An alias for the <codeph>DOUBLE</codeph> data type. See <xref 
href="impala_double.xml#double"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      These examples show how you can use the type names <codeph>REAL</codeph> 
and <codeph>DOUBLE</codeph>
+      interchangeably, and behind the scenes Impala treats them always as 
<codeph>DOUBLE</codeph>.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table r1 (x real);
+[localhost:21000] &gt; describe r1;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | double |         |
++------+--------+---------+
+[localhost:21000] &gt; insert into r1 values (1.5), (cast (2.2 as double));
+[localhost:21000] &gt; select cast (1e6 as real);
++---------------------------+
+| cast(1000000.0 as double) |
++---------------------------+
+| 1000000                   |
++---------------------------+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_refresh.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_refresh.xml b/docs/topics/impala_refresh.xml
new file mode 100644
index 0000000..ee022d5
--- /dev/null
+++ b/docs/topics/impala_refresh.xml
@@ -0,0 +1,234 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="refresh">
+
+  <title>REFRESH Statement</title>
+  <titlealts><navtitle>REFRESH</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Metastore"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REFRESH statement</indexterm>
+      To accurately respond to queries, the Impala node that acts as the 
coordinator (the node to which you are
+      connected through <cmdname>impala-shell</cmdname>, JDBC, or ODBC) must 
have current metadata about those
+      databases and tables that are referenced in Impala queries. If you are 
not familiar with the way Impala uses
+      metadata and how it shares the same metastore database as Hive, see
+      <xref href="impala_hadoop.xml#intro_metastore"/> for background 
information.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>REFRESH 
[<varname>db_name</varname>.]<varname>table_name</varname></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Use the <codeph>REFRESH</codeph> statement to load the latest metastore 
metadata and block location data for
+      a particular table in these scenarios:
+    </p>
+
+    <ul>
+      <li>
+        After loading new data files into the HDFS data directory for the 
table. (Once you have set up an ETL
+        pipeline to bring data into Impala on a regular basis, this is 
typically the most frequent reason why
+        metadata needs to be refreshed.)
+      </li>
+
+      <li>
+        After issuing <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, 
<codeph>LOAD DATA</codeph>, or other
+        table-modifying SQL statement in Hive.
+      </li>
+    </ul>
+
+    <p>
+      You only need to issue the <codeph>REFRESH</codeph> statement on the 
node to which you connect to issue
+      queries. The coordinator node divides the work among all the Impala 
nodes in a cluster, and sends read
+      requests for the correct HDFS blocks without relying on the metadata on 
the other nodes.
+    </p>
+
+    <p>
+      <codeph>REFRESH</codeph> reloads the metadata for the table from the 
metastore database, and does an
+      incremental reload of the low-level block location data to account for 
any new data files added to the HDFS
+      data directory for the table. It is a low-overhead, single-table 
operation, specifically tuned for the common
+      scenario where new data files are added to HDFS.
+    </p>
+
+    <p>
+      Only the metadata for the specified table is flushed. The table must 
already exist and be known to Impala,
+      either because the <codeph>CREATE TABLE</codeph> statement was run in 
Impala rather than Hive, or because a
+      previous <codeph>INVALIDATE METADATA</codeph> statement caused Impala to 
reload its entire metadata catalog.
+    </p>
+
+    <note>
+      <p rev="1.2">
+        In Impala 1.2 and higher, the catalog service broadcasts any changed 
metadata as a result of Impala
+        <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph> and <codeph>LOAD 
DATA</codeph> statements to all
+        Impala nodes. Thus, the <codeph>REFRESH</codeph> statement is only 
required if you load data through Hive
+        or by manipulating data files in HDFS directly. See <xref 
href="impala_components.xml#intro_catalogd"/> for
+        more information on the catalog service.
+      </p>
+      <p rev="1.2.1">
+        In Impala 1.2.1 and higher, another way to avoid inconsistency across 
nodes is to enable the
+        <codeph>SYNC_DDL</codeph> query option before performing a DDL 
statement or an <codeph>INSERT</codeph> or
+        <codeph>LOAD DATA</codeph>.
+      </p>
+      <p>
+        The functionality of the <codeph>REFRESH</codeph> statement has 
changed in Impala 1.1 and higher. Now the
+        table name is a required parameter. To flush the metadata for all 
tables, use the
+        <codeph><xref 
href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE 
METADATA</xref></codeph>
+        command.
+      </p>
+      <draft-comment translate="no"> Almost-identical wording here, under 
INVALIDATE METADATA, and in Release Notes :: New Features. Makes sense to 
conref. </draft-comment>
+      <p>
+        Because <codeph>REFRESH <varname>table_name</varname></codeph> only 
works for tables that Impala is already
+        aware of, when you create a new table in the Hive shell, you must 
enter <codeph>INVALIDATE
+        METADATA</codeph> with no table parameter before you can see the new 
table in
+        <cmdname>impala-shell</cmdname>. Once the table is known to Impala, 
you can issue <codeph>REFRESH
+        <varname>table_name</varname></codeph> as needed after you add more 
data files for that table.
+      </p>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
+
+    <p>
+      A metadata update for an <codeph>impalad</codeph> instance <b>is</b> 
required if:
+    </p>
+
+    <ul>
+      <li>
+        A metadata change occurs.
+      </li>
+
+      <li>
+        <b>and</b> the change is made through Hive.
+      </li>
+
+      <li>
+        <b>and</b> the change is made to a database to which clients such as 
the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p rev="1.2">
+      A metadata update for an Impala node is <b>not</b> required after you 
run <codeph>ALTER TABLE</codeph>,
+      <codeph>INSERT</codeph>, or other table-modifying statement in Impala 
rather than Hive. Impala handles the
+      metadata synchronization automatically through the catalog service.
+    </p>
+
+    <p>
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul>
+      <li>
+        Hive - through <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, 
<codeph>DROP</codeph> or
+        <codeph>INSERT</codeph> operations.
+      </li>
+
+      <li>
+        Impalad - through <codeph>CREATE TABLE</codeph>, <codeph>ALTER 
TABLE</codeph>, and <codeph>INSERT</codeph>
+        operations. <ph rev="1.2">In Impala 1.2 and higher, such changes are 
propagated to all Impala nodes by the
+        Impala catalog service.</ph>
+      </li>
+    </ul>
+
+    <p>
+      <codeph>REFRESH</codeph> causes the metadata for that table to be 
immediately reloaded. For a huge table,
+      that process could take a noticeable amount of time; but doing the 
refresh up front avoids an unpredictable
+      delay later, for example if the next reference to the table is during a 
benchmark test.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how you might use the 
<codeph>REFRESH</codeph> statement after manually adding
+      new HDFS data files to the Impala data directory for a table:
+    </p>
+
+<codeblock>[impalad-host:21000] &gt; refresh t1;
+[impalad-host:21000] &gt; refresh t2;
+[impalad-host:21000] &gt; select * from t1;
+...
+[impalad-host:21000] &gt; select * from t2;
+... </codeblock>
+
+    <p>
+      For more examples of using <codeph>REFRESH</codeph> and 
<codeph>INVALIDATE METADATA</codeph> with a
+      combination of Impala and Hive operations, see <xref 
href="impala_tutorial.xml#tutorial_impala_hive"/>.
+    </p>
+
+    <p>
+      <b>Related impalad options:</b>
+    </p>
+
+    <p>
+      In Impala 1.0, the <codeph>-r</codeph> option of 
<cmdname>impala-shell</cmdname> issued
+      <codeph>REFRESH</codeph> to reload metadata for all tables.
+    </p>
+
+    <p>
+      In Impala 1.1 and higher, this option issues <codeph>INVALIDATE 
METADATA</codeph> because
+      <codeph>REFRESH</codeph> now requires a table name parameter. Due to the 
expense of reloading the metadata
+      for all tables, the <cmdname>impala-shell</cmdname> <codeph>-r</codeph> 
option is not recommended for
+      day-to-day use in a production environment.
+    </p>
+
+    <p rev="1.2">
+      In Impala 1.2 and higher, the <codeph>-r</codeph> option is needed even 
less frequently, because metadata
+      changes caused by SQL statements in Impala are automatically broadcast 
to all Impala nodes.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <codeph>LOCATION</codeph> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this 
statement,
+      but subsequent statements such as <codeph>SELECT</codeph>
+      or <codeph>SHOW TABLE STATS</codeph> could fail.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      The <codeph>REFRESH</codeph> command checks HDFS permissions of the 
underlying data files and directories,
+      caching this information so that a statement can be cancelled 
immediately if for example the
+      <codeph>impala</codeph> user does not have permission to write to the 
data directory for the table. Impala
+      reports any lack of write permissions as an <codeph>INFO</codeph> 
message in the log file, in case that
+      represents an oversight. If you change HDFS permissions to make data 
readable or writeable by the Impala
+      user, issue another <codeph>REFRESH</codeph> to make Impala aware of the 
change.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p conref="../shared/impala_common.xml#common/s3_metadata"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hadoop.xml#intro_metastore"/>,
+      <xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_request_pool.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_request_pool.xml 
b/docs/topics/impala_request_pool.xml
new file mode 100644
index 0000000..cf2a811
--- /dev/null
+++ b/docs/topics/impala_request_pool.xml
@@ -0,0 +1,45 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.3.0" id="request_pool">
+
+  <title>REQUEST_POOL Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Resource Management"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Admission Control"/>
+      <data name="Category" value="YARN"/>
+      <data name="Category" value="Llama"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REQUEST_POOL query option</indexterm>
+      The pool or queue name that queries should be submitted to. Only applies 
when you enable the Impala admission
+      control feature (CDH 4 or CDH 5; see <xref 
href="impala_admission.xml#admission_control"/>), or the YARN
+      resource management feature (CDH 5 only; see
+      <xref href="impala_resource_management.xml#resource_management"/>). 
Specifies the name of the pool used by
+      requests from Impala to the resource manager.
+    </p>
+
+    <p>
+      Formerly known as <codeph>YARN_POOL</codeph> during the CDH 5 beta 
period. Renamed to reflect that it can be
+      used both with YARN and with the lightweight admission control feature 
introduced in Impala 1.3.
+    </p>
+
+    <p>
+      <b>Type:</b> <codeph>STRING</codeph>
+    </p>
+
+    <p>
+      <b>Default:</b> empty (use the user-to-pool mapping defined by an 
<cmdname>impalad</cmdname> startup option
+      in the Impala configuration file)
+    </p>
+
+<!-- Worth adding a couple of related info links here. -->
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_reservation_request_timeout.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_reservation_request_timeout.xml 
b/docs/topics/impala_reservation_request_timeout.xml
new file mode 100644
index 0000000..0316e44
--- /dev/null
+++ b/docs/topics/impala_reservation_request_timeout.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="reservation_request_timeout">
+
+  <title>RESERVATION_REQUEST_TIMEOUT Query Option (CDH 5 only)</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Resource Management"/>
+      <data name="Category" value="YARN"/>
+      <data name="Category" value="Llama"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">RESERVATION_REQUEST_TIMEOUT query 
option</indexterm>
+      Maximum number of milliseconds Impala will wait for a reservation to be 
completely granted or denied. Used in
+      conjunction with the Impala resource management feature in Impala 1.2 
and higher with CDH 5.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 300000 (5 minutes)
+    </p>
+
+<!-- Worth adding a couple of related info links here. -->
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_revoke.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_revoke.xml b/docs/topics/impala_revoke.xml
new file mode 100644
index 0000000..88fbbf9
--- /dev/null
+++ b/docs/topics/impala_revoke.xml
@@ -0,0 +1,96 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="revoke">
+
+  <title>REVOKE Statement (CDH 5.2 or higher only)</title>
+  <titlealts><navtitle>REVOKE (CDH 5.2 or higher only)</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Roles"/>
+      <!-- Consider whether to go deeper into categories like Security for the 
Sentry-related statements. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REVOKE statement</indexterm>
+<!-- Copied from Sentry docs. Turn into conref. I did some rewording for 
clarity. -->
+      The <codeph>REVOKE</codeph> statement revokes roles or privileges on a 
specified object from groups. Only
+      Sentry administrative users can revoke the role from a group. The 
revocation has a cascading effect. For
+      example, revoking the <codeph>ALL</codeph> privilege on a database also 
revokes the same privilege for all
+      the tables in that database.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock rev="2.3.0 collevelauth">REVOKE ROLE <varname>role_name</varname> 
FROM GROUP <varname>group_name</varname>
+
+REVOKE <varname>privilege</varname> ON <varname>object_type</varname> 
<varname>object_name</varname>
+  FROM [ROLE] <varname>role_name</varname>
+
+<ph rev="2.3.0">privilege ::= SELECT | SELECT(<varname>column_name</varname>) 
| INSERT | ALL</ph>
+object_type ::= TABLE | DATABASE | SERVER | URI
+</codeblock>
+
+    <p>
+      Typically, the object name is an identifier. For URIs, it is a string 
literal.
+    </p>
+
+    <p rev="2.3.0 collevelauth">
+      The ability to grant or revoke <codeph>SELECT</codeph> privilege on 
specific columns is available
+      in CDH 5.5 / Impala 2.3 and higher. See <xref 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/>
+      for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/privileges_blurb"/>
+
+    <p>
+      Only administrative users (those with <codeph>ALL</codeph> privileges on 
the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+<!-- Turn compatibility info into a conref or series of conrefs. (In both 
GRANT and REVOKE.) -->
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      <ul>
+        <li>
+          The Impala <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements are available in CDH 5.2 and
+          higher.
+        </li>
+
+        <li>
+          In CDH 5.1 and higher, Impala makes use of any roles and privileges 
specified by the
+          <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in 
Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li>
+          The Impala <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements do not require the
+          <codeph>ROLE</codeph> keyword to be repeated before each role name, 
unlike the equivalent Hive
+          statements.
+        </li>
+
+        <li conref="../shared/impala_common.xml#common/grant_revoke_single"/>
+      </ul>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_authorization.xml#authorization"/>, <xref 
href="impala_grant.xml#grant"/>
+      <xref href="impala_create_role.xml#create_role"/>, <xref 
href="impala_drop_role.xml#drop_role"/>,
+      <xref href="impala_show.xml#show"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_schema_objects.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_schema_objects.xml 
b/docs/topics/impala_schema_objects.xml
new file mode 100644
index 0000000..d8abe12
--- /dev/null
+++ b/docs/topics/impala_schema_objects.xml
@@ -0,0 +1,57 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="schema_objects">
+
+  <title>Impala Schema Objects and Object Names</title>
+  <titlealts><navtitle>Schema Objects and Object Names</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">schema objects</indexterm>
+      With Impala, you work with schema objects that are familiar to database 
users: primarily databases, tables, views,
+      and functions. The SQL syntax to work with these objects is explained in
+      <xref href="impala_langref_sql.xml#langref_sql"/>. This section explains 
the conceptual knowledge you need to
+      work with these objects and the various ways to specify their names.
+    </p>
+
+    <p>
+      Within a table, partitions can also be considered a kind of object. 
Partitioning is an important subject for
+      Impala, with its own documentation section covering use cases and 
performance considerations. See
+      <xref href="impala_partitioning.xml#partitioning"/> for details.
+    </p>
+
+    <p>
+      Impala does not have a counterpart of the <q>tablespace</q> notion from 
some database systems. By default,
+      all the data files for a database, table, or partition are located 
within nested folders within the HDFS file
+      system. You can also specify a particular HDFS location for a given 
Impala table or partition. The raw data
+      for these objects is represented as a collection of data files, 
providing the flexibility to load data by
+      simply moving files into the expected HDFS location.
+    </p>
+
+    <p>
+      Information about the schema objects is held in the
+      <xref href="impala_hadoop.xml#intro_metastore">metastore</xref> 
database. This database is shared between
+      Impala and Hive, allowing each to create, drop, and query each other's 
databases, tables, and so on. When
+      Impala makes a change to schema objects through a 
<codeph>CREATE</codeph>, <codeph>ALTER</codeph>,
+      <codeph>DROP</codeph>, <codeph>INSERT</codeph>, or <codeph>LOAD 
DATA</codeph> statement, it broadcasts those
+      changes to all nodes in the cluster through the <xref 
href="impala_components.xml#intro_catalogd">catalog
+      service</xref>. When you make such changes through Hive or directly 
through manipulating HDFS files, you use
+      the <xref href="impala_refresh.xml#refresh">REFRESH</xref> or
+      <xref 
href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE 
METADATA</xref> statements on the
+      Impala side to recognize the newly loaded data, new tables, and so on.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_select.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_select.xml b/docs/topics/impala_select.xml
new file mode 100644
index 0000000..db63f71
--- /dev/null
+++ b/docs/topics/impala_select.xml
@@ -0,0 +1,203 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="select">
+
+  <title>SELECT Statement</title>
+  <titlealts><navtitle>SELECT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <!-- This is such an important statement, think if there are more 
applicable categories. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SELECT statement</indexterm>
+      The <codeph>SELECT</codeph> statement performs queries, retrieving data 
from one or more tables and producing
+      result sets consisting of rows and columns.
+    </p>
+
+    <p>
+      The Impala <codeph><xref 
href="impala_insert.xml#insert">INSERT</xref></codeph> statement also typically 
ends
+      with a <codeph>SELECT</codeph> statement, to define data to copy from 
one table to another.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>[WITH <i>name</i> AS (<i>select_expression</i>) [, ...] ]
+SELECT
+  [ALL | DISTINCT]
+  [STRAIGHT_JOIN]
+  <i>expression</i> [, <i>expression</i> ...]
+FROM <i>table_reference</i> [, <i>table_reference</i> ...]
+[[FULL | [LEFT | RIGHT] INNER | [LEFT | RIGHT] OUTER | [LEFT | RIGHT] SEMI | 
[LEFT | RIGHT] ANTI | CROSS]
+  JOIN <i>table_reference</i>
+  [ON <i>join_equality_clauses</i> | USING (<varname>col1</varname>[, 
<varname>col2</varname> ...]] ...
+WHERE <i>conditions</i>
+GROUP BY { <i>column</i> | <i>expression</i> [ASC | DESC] [NULLS FIRST | NULLS 
LAST] [, ...] }
+HAVING <codeph>conditions</codeph>
+GROUP BY { <i>column</i> | <i>expression</i> [ASC | DESC] [, ...] }
+LIMIT <i>expression</i> [OFFSET <i>expression</i>]
+[UNION [ALL] <i>select_statement</i>] ...]
+</codeblock>
+
+    <p>
+      Impala <codeph>SELECT</codeph> queries support:
+    </p>
+
+    <ul>
+      <li>
+        SQL scalar data types: <codeph><xref 
href="impala_boolean.xml#boolean">BOOLEAN</xref></codeph>,
+        <codeph><xref 
href="impala_tinyint.xml#tinyint">TINYINT</xref></codeph>,
+        <codeph><xref 
href="impala_smallint.xml#smallint">SMALLINT</xref></codeph>,
+        <codeph><xref href="impala_int.xml#int">INT</xref></codeph>,
+        <codeph><xref href="impala_bigint.xml#bigint">BIGINT</xref></codeph>,
+        <codeph><xref href="impala_decimal.xml#decimal">DECIMAL</xref></codeph>
+        <codeph><xref href="impala_float.xml#float">FLOAT</xref></codeph>,
+        <codeph><xref href="impala_double.xml#double">DOUBLE</xref></codeph>,
+        <codeph><xref 
href="impala_timestamp.xml#timestamp">TIMESTAMP</xref></codeph>,
+        <codeph><xref href="impala_string.xml#string">STRING</xref></codeph>,
+        <codeph><xref 
href="impala_varchar.xml#varchar">VARCHAR</xref></codeph>,
+        <codeph><xref href="impala_char.xml#char">CHAR</xref></codeph>.
+      </li>
+
+<!-- To do: Consider promoting 'querying complex types' to its own subtopic or 
pseudo-heading. -->
+      <li rev="2.3.0">
+        The complex data types <codeph>ARRAY</codeph>, 
<codeph>STRUCT</codeph>, and <codeph>MAP</codeph>,
+        are available in CDH 5.5 / Impala 2.3 and higher.
+        Queries involving these types typically involve special qualified names
+        using dot notation for referring to the complex column fields,
+        and join clauses for bringing the complex columns into the result set.
+        See <xref href="impala_complex_types.xml#complex_types"/> for details.
+      </li>
+
+      <li rev="1.1">
+        An optional <xref href="impala_with.xml#with"><codeph>WITH</codeph> 
clause</xref> before the
+        <codeph>SELECT</codeph> keyword, to define a subquery whose name or 
column names can be referenced from
+        later in the main query. This clause lets you abstract repeated 
clauses, such as aggregation functions,
+        that are referenced multiple times in the same query.
+      </li>
+
+      <li>
+        By default, one <codeph>DISTINCT</codeph> clause per query. See <xref 
href="impala_distinct.xml#distinct"/>
+        for details. See <xref 
href="impala_appx_count_distinct.xml#appx_count_distinct"/> for a query option 
to
+        allow multiple <codeph>COUNT(DISTINCT)</codeph> impressions in the 
same query.
+      </li>
+
+      <li>
+        Subqueries in a <codeph>FROM</codeph> clause. In CDH 5.2 / Impala 2.0 
and higher,
+        subqueries can also go in the <codeph>WHERE</codeph> clause, for 
example with the
+        <codeph>IN()</codeph>, <codeph>EXISTS</codeph>, and <codeph>NOT 
EXISTS</codeph> operators.
+      </li>
+
+      <li>
+        <codeph>WHERE</codeph>, <codeph>GROUP BY</codeph>, 
<codeph>HAVING</codeph> clauses.
+      </li>
+
+      <li rev="obwl">
+        <codeph><xref href="impala_order_by.xml#order_by">ORDER 
BY</xref></codeph>. Prior to Impala 1.4.0, Impala
+        required that queries using an <codeph>ORDER BY</codeph> clause also 
include a
+        <codeph><xref href="impala_limit.xml#limit">LIMIT</xref></codeph> 
clause. In Impala 1.4.0 and higher, this
+        restriction is lifted; sort operations that would exceed the Impala 
memory limit automatically use a
+        temporary disk work area to perform the sort.
+      </li>
+
+      <li>
+        <p conref="../shared/impala_common.xml#common/join_types"/>
+        <p>
+          See <xref href="impala_joins.xml#joins"/> for details and examples 
of join queries.
+        </p>
+      </li>
+
+      <li>
+        <codeph>UNION ALL</codeph>.
+      </li>
+
+      <li>
+        <codeph>LIMIT</codeph>.
+      </li>
+
+      <li>
+        External tables.
+      </li>
+
+      <li>
+        Relational operators such as greater than, less than, or equal to.
+      </li>
+
+      <li>
+        Arithmetic operators such as addition or subtraction.
+      </li>
+
+      <li>
+        Logical/Boolean operators <codeph>AND</codeph>, <codeph>OR</codeph>, 
and <codeph>NOT</codeph>. Impala does
+        not support the corresponding symbols <codeph>&amp;&amp;</codeph>, 
<codeph>||</codeph>, and
+        <codeph>!</codeph>.
+      </li>
+
+      <li>
+        Common SQL built-in functions such as <codeph>COUNT</codeph>, 
<codeph>SUM</codeph>, <codeph>CAST</codeph>,
+        <codeph>LIKE</codeph>, <codeph>IN</codeph>, <codeph>BETWEEN</codeph>, 
and <codeph>COALESCE</codeph>. Impala
+        specifically supports built-ins described in <xref 
href="impala_functions.xml#builtins"/>.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
+
+    <p conref="../shared/impala_common.xml#common/security_blurb"/>
+    <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have read
+      permissions for the files in all applicable directories in all source 
tables,
+      and read and execute permissions for the relevant data directories.
+      (A <codeph>SELECT</codeph> operation could read files from multiple 
different HDFS directories
+      if the source table is partitioned.)
+      If a query attempts to read a data file and is unable to because of an 
HDFS permission error,
+      the query halts and does not return any further results.
+    </p>
+
+    <p outputclass="toc"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      The <codeph>SELECT</codeph> syntax is so extensive that it forms its own 
category of statements: queries. The
+      other major classifications of SQL statements are data definition 
language (see
+      <xref href="impala_ddl.xml#ddl"/>) and data manipulation language (see 
<xref href="impala_dml.xml#dml"/>).
+    </p>
+
+    <p>
+      Because the focus of Impala is on fast queries with interactive response 
times over huge data sets, query
+      performance and scalability are important considerations. See
+      <xref href="impala_performance.xml#performance"/> and <xref 
href="impala_scalability.xml#scalability"/> for
+      details.
+    </p>
+  </conbody>
+
+  <concept id="where" audience="Cloudera">
+
+<!-- WHERE hidden for the moment until there's the chance to add some 
reasonably comprehensive content
+
+     and make it its own file. -->
+
+    <title>WHERE Clause</title>
+
+    <conbody>
+
+      <p/>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_set.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_set.xml b/docs/topics/impala_set.xml
new file mode 100644
index 0000000..afa6777
--- /dev/null
+++ b/docs/topics/impala_set.xml
@@ -0,0 +1,90 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="set">
+
+  <title>SET Statement</title>
+  <titlealts><navtitle>SET</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Configuring"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SET statement</indexterm>
+      Specifies values for query options that control the runtime behavior of 
other statements within the same
+      session.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET 
[<varname>query_option</varname>=<varname>option_value</varname>]
+</codeblock>
+
+    <p>
+      <codeph>SET</codeph> with no arguments returns a result set consisting 
of all available query options and
+      their current values.
+    </p>
+
+    <p>
+      The query option name and any string argument values are 
case-insensitive.
+    </p>
+
+    <p>
+      Each query option has a specific allowed notation for its arguments. 
Boolean options can be enabled and
+      disabled by assigning values of either <codeph>true</codeph> and 
<codeph>false</codeph>, or
+      <codeph>1</codeph> and <codeph>0</codeph>. Some numeric options accept a 
final character signifying the unit,
+      such as <codeph>2g</codeph> for 2 gigabytes or <codeph>100m</codeph> for 
100 megabytes. See
+      <xref href="impala_query_options.xml#query_options"/> for the details of 
each query option.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      <codeph>MEM_LIMIT</codeph> is probably the most commonly used query 
option. You can specify a high value to
+      allow a resource-intensive query to complete. For testing how queries 
would work on memory-constrained
+      systems, you might specify an artificially low value.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example sets some numeric and some Boolean query options 
to control usage of memory, disk
+      space, and timeout periods, then runs a query whose success could depend 
on the options in effect:
+    </p>
+
+<codeblock>set mem_limit=64g;
+set DISABLE_UNSAFE_SPILLS=true;
+set parquet_file_size=400m;
+set RESERVATION_REQUEST_TIMEOUT=900000;
+insert overwrite parquet_table select c1, c2, count(c3) from text_table group 
by c1, c2, c3;
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/added_in_20"/>
+
+    <p>
+      <codeph>SET</codeph> has always been available as an 
<cmdname>impala-shell</cmdname> command. Promoting it to
+      a SQL statement lets you use this feature in client applications through 
the JDBC and ODBC APIs.
+    </p>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/jdbc_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      See <xref href="impala_query_options.xml#query_options"/> for the query 
options you can adjust using this
+      statement.
+    </p>
+  </conbody>
+</concept>

[06/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

Reply via email to