[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Reviewed-on: http://gerrit.cloudera.org:8080/5649
Reviewed-by: Matthew Jacobs <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/661921b2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/661921b2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/661921b2

Branch: refs/heads/master
Commit: 661921b205caf5b894f0f8803418c302e7a55293
Parents: aee5457
Author: John Russell <[email protected]>
Authored: Mon Jan 9 14:17:23 2017 -0800
Committer: Impala Public Jenkins <[email protected]>
Committed: Fri Feb 17 01:10:12 2017 +0000

----------------------------------------------------------------------
 docs/impala_keydefs.ditamap                |    2 +
 docs/shared/impala_common.xml              |   52 +
 docs/topics/impala_alter_table.xml         |   80 +-
 docs/topics/impala_array.xml               |    3 +
 docs/topics/impala_boolean.xml             |    3 +
 docs/topics/impala_char.xml                |    3 +
 docs/topics/impala_compute_stats.xml       |   14 +-
 docs/topics/impala_create_table.xml        |  924 +++++++++++-----
 docs/topics/impala_decimal.xml             |    3 +
 docs/topics/impala_describe.xml            |   85 ++
 docs/topics/impala_double.xml              |    3 +
 docs/topics/impala_drop_table.xml          |    9 +
 docs/topics/impala_explain.xml             |   36 +
 docs/topics/impala_float.xml               |    3 +
 docs/topics/impala_grant.xml               |    3 +
 docs/topics/impala_invalidate_metadata.xml |    5 +
 docs/topics/impala_kudu.xml                | 1331 +++++++++++++++++++++--
 docs/topics/impala_literals.xml            |   18 +
 docs/topics/impala_map.xml                 |    3 +
 docs/topics/impala_partitioning.xml        |    8 +-
 docs/topics/impala_refresh.xml             |    5 +
 docs/topics/impala_reserved_words.xml      |   16 +-
 docs/topics/impala_revoke.xml              |    3 +
 docs/topics/impala_show.xml                |  173 ++-
 docs/topics/impala_struct.xml              |    3 +
 docs/topics/impala_tables.xml              |  145 ++-
 docs/topics/impala_timestamp.xml           |    3 +
 docs/topics/impala_truncate_table.xml      |    3 +
 docs/topics/impala_varchar.xml             |    3 +
 29 files changed, 2590 insertions(+), 352 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 4fe8813..2562df9 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10285,6 +10285,7 @@ 
https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=p
   <keydef keys="impala25"><topicmeta><keywords><keyword>Impala 
2.5</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala24"><topicmeta><keywords><keyword>Impala 
2.4</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala23"><topicmeta><keywords><keyword>Impala 
2.3</keyword></keywords></topicmeta></keydef>
+  <keydef keys="impala223"><topicmeta><keywords><keyword>Impala 
2.2.3</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala22"><topicmeta><keywords><keyword>Impala 
2.2</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala21"><topicmeta><keywords><keyword>Impala 
2.1</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala20"><topicmeta><keywords><keyword>Impala 
2.0</keyword></keywords></topicmeta></keydef>
@@ -10298,6 +10299,7 @@ 
https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=p
   <keydef keys="impala25_full"><topicmeta><keywords><keyword>Impala 
2.5</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala24_full"><topicmeta><keywords><keyword>Impala 
2.4</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala23_full"><topicmeta><keywords><keyword>Impala 
2.3</keyword></keywords></topicmeta></keydef>
+  <keydef keys="impala223_full"><topicmeta><keywords><keyword>Impala 
2.2.3</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala22_full"><topicmeta><keywords><keyword>Impala 
2.2</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala21_full"><topicmeta><keywords><keyword>Impala 
2.1</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala20_full"><topicmeta><keywords><keyword>Impala 
2.0</keyword></keywords></topicmeta></keydef>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 1b8c171..4a9aa32 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -3730,6 +3730,58 @@ sudo pip-python install ssl</codeblock>
         NULL</codeph> attribute to that column.
       </p>
 
+      <p id="kudu_metadata_intro" rev="kudu">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+
+      <p id="kudu_metadata_details" rev="kudu">
+        The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <codeph>REFRESH <varname>table_name</varname></codeph> or
+        <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+ 
+      <p id="kudu_internal_external_tables">
+        The distinction between internal and external tables has some special
+        details for Kudu tables. Tables created entirely through Impala are
+        internal tables. The table name as represented within Kudu includes
+        notation such as an <codeph>impala::</codeph> prefix and the Impala
+        database name. External Kudu tables are those created by a non-Impala
+        mechanism, such as a user application calling the Kudu APIs. For
+        these tables, the <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets
+        you establish a mapping from Impala to the existing Kudu table:
+<codeblock>
+CREATE EXTERNAL TABLE impala_name STORED AS KUDU
+  TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
+</codeblock>
+        External Kudu tables differ in one important way from other external
+        tables: adding or dropping a column or range partition changes the
+        data in the underlying Kudu table, in contrast to an HDFS-backed
+        external table where existing data files are left untouched.
+      </p>
+
+      <p id="kudu_sentry_limitations" rev="IMPALA-4000">
+        Access to Kudu tables must be granted to and revoked from roles as 
usual.
+        Only users with <codeph>ALL</codeph> privileges on 
<codeph>SERVER</codeph> can create external Kudu tables.
+        Currently, access to a Kudu table is <q>all or nothing</q>:
+        enforced at the table level rather than the column level, and applying 
to all
+        SQL operations rather than individual statements such as 
<codeph>INSERT</codeph>.
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </p>
+
     </section>
 
   </conbody>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_alter_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_alter_table.xml 
b/docs/topics/impala_alter_table.xml
index c4df150..a3f1e19 100644
--- a/docs/topics/impala_alter_table.xml
+++ b/docs/topics/impala_alter_table.xml
@@ -34,6 +34,7 @@ under the License.
       <data name="Category" value="S3"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Kudu"/>
     </metadata>
   </prolog>
 
@@ -63,9 +64,11 @@ ALTER TABLE <varname>name</varname> REPLACE COLUMNS 
(<varname>col_spec</varname>
 ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] PARTITION 
(<varname>partition_spec</varname>)
   <ph rev="IMPALA-4390">[<varname>location_spec</varname>]</ph>
   <ph rev="IMPALA-4390">[<varname>cache_spec</varname>]</ph>
+<ph rev="kudu">ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] RANGE 
PARTITION (<varname>kudu_partition_spec</varname>)</ph>
 
 ALTER TABLE <varname>name</varname> DROP [IF EXISTS] PARTITION 
(<varname>partition_spec</varname>)
   <ph rev="2.3.0">[PURGE]</ph>
+<ph rev="kudu">ALTER TABLE <varname>name</varname> DROP [IF EXISTS] RANGE 
PARTITION <varname>kudu_partition_spec</varname></ph>
 
 <ph rev="2.3.0 IMPALA-1568 CDH-36799">ALTER TABLE <varname>name</varname> 
RECOVER PARTITIONS</ph>
 
@@ -86,12 +89,18 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize</ph>
 
 <varname>col_spec</varname> ::= <varname>col_name</varname> 
<varname>type_name</varname>
 
-<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> 
| <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph 
rev="kudu"><varname>kudu_partition_spec</varname></ph>
+<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> 
| <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
 
 <varname>simple_partition_spec</varname> ::= 
<varname>partition_col</varname>=<varname>constant_value</varname>
 
 <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname> ::= 
<varname>comparison_expression_on_partition_col</varname></ph>
 
+<ph rev="kudu"><varname>kudu_partition_spec</varname> ::= 
<varname>constant</varname> <varname>range_operator</varname> VALUES 
<varname>range_operator</varname> <varname>constant</varname> | VALUE = 
<varname>constant</varname></ph>
+
+<ph rev="IMPALA-4390">cache_spec ::= CACHED IN '<varname>pool_name</varname>' 
[WITH REPLICATION = <varname>integer</varname>] | UNCACHED</ph>
+
+<ph rev="IMPALA-4390">location_spec ::= LOCATION 
'<varname>hdfs_path_of_directory</varname>'</ph>
+
 <varname>table_properties</varname> ::= 
'<varname>name</varname>'='<varname>value</varname>'[, 
'<varname>name</varname>'='<varname>value</varname>' ...]
 
 <varname>serde_properties</varname> ::= 
'<varname>name</varname>'='<varname>value</varname>'[, 
'<varname>name</varname>'='<varname>value</varname>' ...]
@@ -896,6 +905,75 @@ alter table sales_data add partition (zipcode = cast(9021 
* 10 as string));</cod
       require write and execute permissions for the associated partition 
directory.
     </p>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+    <p rev="kudu IMPALA-2890">
+      Because of the extra constraints and features of Kudu tables, such as 
the <codeph>NOT NULL</codeph>
+      and <codeph>DEFAULT</codeph> attributes for columns, <codeph>ALTER 
TABLE</codeph> has specific
+      requirements related to Kudu tables:
+      <ul>
+        <li>
+          <p>
+            In an <codeph>ADD COLUMNS</codeph> operation, you can specify the 
<codeph>NULL</codeph>,
+            <codeph>NOT NULL</codeph>, and <codeph>DEFAULT 
<varname>default_value</varname></codeph>
+            column attributes.
+          </p>
+        </li>
+        <li>
+          <p>
+            If you add a column with a <codeph>NOT NULL</codeph> attribute, it 
must also have a
+            <codeph>DEFAULT</codeph> attribute, so the default value can be 
assigned to that
+            column for all existing rows.
+          </p>
+        </li>
+        <li>
+          <p>
+            The <codeph>DROP COLUMN</codeph> clause works the same for a Kudu 
table as for other
+            kinds of tables.
+          </p>
+        </li>
+        <li>
+          <p>
+            Although you can change the name of a column with the 
<codeph>CHANGE</codeph> clause,
+            you cannot change the type of a column in a Kudu table.
+          </p>
+        </li>
+        <li>
+          <p>
+            You cannot assign the <codeph>ENCODING</codeph>, 
<codeph>COMPRESSION</codeph>,
+            or <codeph>BLOCK_SIZE</codeph> attributes when adding a column.
+          </p>
+        </li>
+        <li>
+          <p>
+            You cannot change the default value, nullability, encoding, 
compression, or block size
+            of existing columns in a Kudu table.
+          </p>
+        </li>
+        <li>
+          <p>
+            You cannot use the <codeph>REPLACE COLUMNS</codeph> clause with a 
Kudu table.
+          </p>
+        </li>
+        <li>
+          <p>
+            The <codeph>RENAME TO</codeph> clause for a Kudu table only 
affects the name stored in the
+            metastore database that Impala uses to refer to the table. To 
change which underlying Kudu
+            table is associated with an Impala table name, you must change the 
<codeph>TBLPROPERTIES</codeph>
+            property of the table: <codeph>SET 
TBLPROPERTIES('kudu.table_name'='<varname>kudu_tbl_name</varname>)</codeph>.
+            Doing so causes Kudu to change the name of the underlying Kudu 
table.
+          </p>
+        </li>
+      </ul>
+    </p>
+
+    <p rev="kudu">
+      Kudu tables all use an underlying partitioning mechanism. The partition 
syntax is different than for non-Kudu
+      tables. You can use the <codeph>ALTER TABLE</codeph> statement to add 
and drop <term>range partitions</term>
+      from a Kudu table. Any new range must not overlap with any existing 
ranges. Dropping a range removes all the associated
+      rows from the table. See <xref 
href="impala_kudu.xml#kudu_partitioning"/> for details.
+    </p>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_array.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_array.xml b/docs/topics/impala_array.xml
index be26874..f882b97 100644
--- a/docs/topics/impala_array.xml
+++ b/docs/topics/impala_array.xml
@@ -115,6 +115,9 @@ type ::= <varname>primitive_type</varname> | 
<varname>complex_type</varname>
         <li/>
       </ul>
 
+      <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+      <p 
conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
       <note 
conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_boolean.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_boolean.xml b/docs/topics/impala_boolean.xml
index fcb3ec7..1e0690f 100644
--- a/docs/topics/impala_boolean.xml
+++ b/docs/topics/impala_boolean.xml
@@ -161,6 +161,9 @@ SELECT claim FROM assertions WHERE really = TRUE;
 
 <!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> -->
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
+
 <!-- <p conref="../shared/impala_common.xml#common/related_info"/> -->
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_char.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_char.xml b/docs/topics/impala_char.xml
index ca6f314..dc8ad5a 100644
--- a/docs/topics/impala_char.xml
+++ b/docs/topics/impala_char.xml
@@ -243,6 +243,9 @@ select concat('[',a,']') as a, concat('[',b,']') as b, 
concat('[',c,']') as c fr
 
+------------------------+----------------------------------+--------------------------------------------+
 </codeblock>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml 
b/docs/topics/impala_compute_stats.xml
index 91f45c2..bd21dae 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -52,8 +52,7 @@ under the License.
 <codeblock rev="2.1.0">COMPUTE STATS 
[<varname>db_name</varname>.]<varname>table_name</varname>
 COMPUTE INCREMENTAL STATS 
[<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION 
(<varname>partition_spec</varname>)]
 
-<!-- Is kudu_partition_spec applicable here? -->
-<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> 
| <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph 
rev="kudu"><varname>kudu_partition_spec</varname></ph>
+<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> 
| <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
 
 <varname>simple_partition_spec</varname> ::= 
<varname>partition_col</varname>=<varname>constant_value</varname>
 
@@ -523,6 +522,17 @@ show table stats item_partitioned;
       against the table.)
     </p>
 
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+    <p rev="IMPALA-2830">
+      The <codeph>COMPUTE STATS</codeph> statement applies to Kudu tables.
+      Impala does not compute the number of rows for each partition for
+      Kudu tables. Therefore, you do not need to re-run the operation when
+      you see -1 in the <codeph># Rows</codeph> column of the output from
+      <codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
+      all Kudu tables. 
+    </p>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

Reply via email to