[DOCS] Major update to Impala + Kudu page Upgrade with details of latest syntax.
Fine-tune discussion of PK and other Kudu notions. The impala_kudu diff looks larger than actual changes to the page, because subtopics got moved around and promoted/demoted (which changes the indentation). Best to review that page start-to-finish. CREATE TABLE details for Impala + Kudu. ALTER TABLE details for Impala + Kudu. Unhide the Impala partitioning + Kudu topic. Mainly a brief intro then a link to delegate details to the main Kudu page, which already has a partitioning subtopic. Include changes to reserved words. Entirely from Kudu integration work. Add Kudu considerations for misc SQL statements. Addressed Todd's and Dimitris's comments for certain files. (Up to the beginning of the "Partitioning" section in impala_kudu.xml.) Added Kudu blurbs to data type topics: - Some aren't supported. - Others are supported but can't go in the primary key. Added walkthrough of renaming internal/external tables. Split out Kudu CREATE TABLE syntax from other file formats. Correct info about CTAS for Kudu tables. Add examples of basic Kudu, external Kudu, and Kudu CTAS. Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Reviewed-on: http://gerrit.cloudera.org:8080/5649 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/661921b2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/661921b2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/661921b2 Branch: refs/heads/master Commit: 661921b205caf5b894f0f8803418c302e7a55293 Parents: aee5457 Author: John Russell <[email protected]> Authored: Mon Jan 9 14:17:23 2017 -0800 Committer: Impala Public Jenkins <[email protected]> Committed: Fri Feb 17 01:10:12 2017 +0000 ---------------------------------------------------------------------- docs/impala_keydefs.ditamap | 2 + docs/shared/impala_common.xml | 52 + docs/topics/impala_alter_table.xml | 80 +- docs/topics/impala_array.xml | 3 + docs/topics/impala_boolean.xml | 3 + docs/topics/impala_char.xml | 3 + docs/topics/impala_compute_stats.xml | 14 +- docs/topics/impala_create_table.xml | 924 +++++++++++----- docs/topics/impala_decimal.xml | 3 + docs/topics/impala_describe.xml | 85 ++ docs/topics/impala_double.xml | 3 + docs/topics/impala_drop_table.xml | 9 + docs/topics/impala_explain.xml | 36 + docs/topics/impala_float.xml | 3 + docs/topics/impala_grant.xml | 3 + docs/topics/impala_invalidate_metadata.xml | 5 + docs/topics/impala_kudu.xml | 1331 +++++++++++++++++++++-- docs/topics/impala_literals.xml | 18 + docs/topics/impala_map.xml | 3 + docs/topics/impala_partitioning.xml | 8 +- docs/topics/impala_refresh.xml | 5 + docs/topics/impala_reserved_words.xml | 16 +- docs/topics/impala_revoke.xml | 3 + docs/topics/impala_show.xml | 173 ++- docs/topics/impala_struct.xml | 3 + docs/topics/impala_tables.xml | 145 ++- docs/topics/impala_timestamp.xml | 3 + docs/topics/impala_truncate_table.xml | 3 + docs/topics/impala_varchar.xml | 3 + 29 files changed, 2590 insertions(+), 352 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/impala_keydefs.ditamap ---------------------------------------------------------------------- diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap index 4fe8813..2562df9 100644 --- a/docs/impala_keydefs.ditamap +++ b/docs/impala_keydefs.ditamap @@ -10285,6 +10285,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=p <keydef keys="impala25"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef> <keydef keys="impala24"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef> <keydef keys="impala23"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef> + <keydef keys="impala223"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef> <keydef keys="impala22"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef> <keydef keys="impala21"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef> <keydef keys="impala20"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef> @@ -10298,6 +10299,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=p <keydef keys="impala25_full"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef> <keydef keys="impala24_full"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef> <keydef keys="impala23_full"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef> + <keydef keys="impala223_full"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef> <keydef keys="impala22_full"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef> <keydef keys="impala21_full"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef> <keydef keys="impala20_full"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/shared/impala_common.xml ---------------------------------------------------------------------- diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml index 1b8c171..4a9aa32 100644 --- a/docs/shared/impala_common.xml +++ b/docs/shared/impala_common.xml @@ -3730,6 +3730,58 @@ sudo pip-python install ssl</codeblock> NULL</codeph> attribute to that column. </p> + <p id="kudu_metadata_intro" rev="kudu"> + Much of the metadata for Kudu tables is handled by the underlying + storage layer. Kudu tables have less reliance on the metastore + database, and require less metadata caching on the Impala side. + For example, information about partitions in Kudu tables is managed + by Kudu, and Impala does not cache any block locality metadata + for Kudu tables. + </p> + + <p id="kudu_metadata_details" rev="kudu"> + The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> + statements are needed less frequently for Kudu tables than for + HDFS-backed tables. Neither statement is needed when data is + added to, removed, or updated in a Kudu table, even if the changes + are made directly to Kudu through a client program using the Kudu API. + Run <codeph>REFRESH <varname>table_name</varname></codeph> or + <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph> + for a Kudu table only after making a change to the Kudu table schema, + such as adding or dropping a column, by a mechanism other than + Impala. + </p> + + <p id="kudu_internal_external_tables"> + The distinction between internal and external tables has some special + details for Kudu tables. Tables created entirely through Impala are + internal tables. The table name as represented within Kudu includes + notation such as an <codeph>impala::</codeph> prefix and the Impala + database name. External Kudu tables are those created by a non-Impala + mechanism, such as a user application calling the Kudu APIs. For + these tables, the <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets + you establish a mapping from Impala to the existing Kudu table: +<codeblock> +CREATE EXTERNAL TABLE impala_name STORED AS KUDU + TBLPROPERTIES('kudu.table_name' = 'original_kudu_name'); +</codeblock> + External Kudu tables differ in one important way from other external + tables: adding or dropping a column or range partition changes the + data in the underlying Kudu table, in contrast to an HDFS-backed + external table where existing data files are left untouched. + </p> + + <p id="kudu_sentry_limitations" rev="IMPALA-4000"> + Access to Kudu tables must be granted to and revoked from roles as usual. + Only users with <codeph>ALL</codeph> privileges on <codeph>SERVER</codeph> can create external Kudu tables. + Currently, access to a Kudu table is <q>all or nothing</q>: + enforced at the table level rather than the column level, and applying to all + SQL operations rather than individual statements such as <codeph>INSERT</codeph>. + Because non-SQL APIs can access Kudu data without going through Sentry + authorization, currently the Sentry support is considered preliminary + and subject to change. + </p> + </section> </conbody> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_alter_table.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_alter_table.xml b/docs/topics/impala_alter_table.xml index c4df150..a3f1e19 100644 --- a/docs/topics/impala_alter_table.xml +++ b/docs/topics/impala_alter_table.xml @@ -34,6 +34,7 @@ under the License. <data name="Category" value="S3"/> <data name="Category" value="Developers"/> <data name="Category" value="Data Analysts"/> + <data name="Category" value="Kudu"/> </metadata> </prolog> @@ -63,9 +64,11 @@ ALTER TABLE <varname>name</varname> REPLACE COLUMNS (<varname>col_spec</varname> ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] PARTITION (<varname>partition_spec</varname>) <ph rev="IMPALA-4390">[<varname>location_spec</varname>]</ph> <ph rev="IMPALA-4390">[<varname>cache_spec</varname>]</ph> +<ph rev="kudu">ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] RANGE PARTITION (<varname>kudu_partition_spec</varname>)</ph> ALTER TABLE <varname>name</varname> DROP [IF EXISTS] PARTITION (<varname>partition_spec</varname>) <ph rev="2.3.0">[PURGE]</ph> +<ph rev="kudu">ALTER TABLE <varname>name</varname> DROP [IF EXISTS] RANGE PARTITION <varname>kudu_partition_spec</varname></ph> <ph rev="2.3.0 IMPALA-1568 CDH-36799">ALTER TABLE <varname>name</varname> RECOVER PARTITIONS</ph> @@ -86,12 +89,18 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize</ph> <varname>col_spec</varname> ::= <varname>col_name</varname> <varname>type_name</varname> -<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph> +<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> <varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname> <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname> ::= <varname>comparison_expression_on_partition_col</varname></ph> +<ph rev="kudu"><varname>kudu_partition_spec</varname> ::= <varname>constant</varname> <varname>range_operator</varname> VALUES <varname>range_operator</varname> <varname>constant</varname> | VALUE = <varname>constant</varname></ph> + +<ph rev="IMPALA-4390">cache_spec ::= CACHED IN '<varname>pool_name</varname>' [WITH REPLICATION = <varname>integer</varname>] | UNCACHED</ph> + +<ph rev="IMPALA-4390">location_spec ::= LOCATION '<varname>hdfs_path_of_directory</varname>'</ph> + <varname>table_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...] <varname>serde_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...] @@ -896,6 +905,75 @@ alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</cod require write and execute permissions for the associated partition directory. </p> + <p conref="../shared/impala_common.xml#common/kudu_blurb"/> + + <p rev="kudu IMPALA-2890"> + Because of the extra constraints and features of Kudu tables, such as the <codeph>NOT NULL</codeph> + and <codeph>DEFAULT</codeph> attributes for columns, <codeph>ALTER TABLE</codeph> has specific + requirements related to Kudu tables: + <ul> + <li> + <p> + In an <codeph>ADD COLUMNS</codeph> operation, you can specify the <codeph>NULL</codeph>, + <codeph>NOT NULL</codeph>, and <codeph>DEFAULT <varname>default_value</varname></codeph> + column attributes. + </p> + </li> + <li> + <p> + If you add a column with a <codeph>NOT NULL</codeph> attribute, it must also have a + <codeph>DEFAULT</codeph> attribute, so the default value can be assigned to that + column for all existing rows. + </p> + </li> + <li> + <p> + The <codeph>DROP COLUMN</codeph> clause works the same for a Kudu table as for other + kinds of tables. + </p> + </li> + <li> + <p> + Although you can change the name of a column with the <codeph>CHANGE</codeph> clause, + you cannot change the type of a column in a Kudu table. + </p> + </li> + <li> + <p> + You cannot assign the <codeph>ENCODING</codeph>, <codeph>COMPRESSION</codeph>, + or <codeph>BLOCK_SIZE</codeph> attributes when adding a column. + </p> + </li> + <li> + <p> + You cannot change the default value, nullability, encoding, compression, or block size + of existing columns in a Kudu table. + </p> + </li> + <li> + <p> + You cannot use the <codeph>REPLACE COLUMNS</codeph> clause with a Kudu table. + </p> + </li> + <li> + <p> + The <codeph>RENAME TO</codeph> clause for a Kudu table only affects the name stored in the + metastore database that Impala uses to refer to the table. To change which underlying Kudu + table is associated with an Impala table name, you must change the <codeph>TBLPROPERTIES</codeph> + property of the table: <codeph>SET TBLPROPERTIES('kudu.table_name'='<varname>kudu_tbl_name</varname>)</codeph>. + Doing so causes Kudu to change the name of the underlying Kudu table. + </p> + </li> + </ul> + </p> + + <p rev="kudu"> + Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu + tables. You can use the <codeph>ALTER TABLE</codeph> statement to add and drop <term>range partitions</term> + from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated + rows from the table. See <xref href="impala_kudu.xml#kudu_partitioning"/> for details. + </p> + <p conref="../shared/impala_common.xml#common/related_info"/> <p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_array.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_array.xml b/docs/topics/impala_array.xml index be26874..f882b97 100644 --- a/docs/topics/impala_array.xml +++ b/docs/topics/impala_array.xml @@ -115,6 +115,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname> <li/> </ul> + <p conref="../shared/impala_common.xml#common/kudu_blurb"/> + <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_boolean.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_boolean.xml b/docs/topics/impala_boolean.xml index fcb3ec7..1e0690f 100644 --- a/docs/topics/impala_boolean.xml +++ b/docs/topics/impala_boolean.xml @@ -161,6 +161,9 @@ SELECT claim FROM assertions WHERE really = TRUE; <!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> --> + <p conref="../shared/impala_common.xml#common/kudu_blurb"/> + <p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/> + <!-- <p conref="../shared/impala_common.xml#common/related_info"/> --> <p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_char.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_char.xml b/docs/topics/impala_char.xml index ca6f314..dc8ad5a 100644 --- a/docs/topics/impala_char.xml +++ b/docs/topics/impala_char.xml @@ -243,6 +243,9 @@ select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c fr +------------------------+----------------------------------+--------------------------------------------+ </codeblock> + <p conref="../shared/impala_common.xml#common/kudu_blurb"/> + <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/> + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> <p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_compute_stats.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml index 91f45c2..bd21dae 100644 --- a/docs/topics/impala_compute_stats.xml +++ b/docs/topics/impala_compute_stats.xml @@ -52,8 +52,7 @@ under the License. <codeblock rev="2.1.0">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname> COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>partition_spec</varname>)] -<!-- Is kudu_partition_spec applicable here? --> -<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph> +<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> <varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname> @@ -523,6 +522,17 @@ show table stats item_partitioned; against the table.) </p> + <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/> + + <p rev="IMPALA-2830"> + The <codeph>COMPUTE STATS</codeph> statement applies to Kudu tables. + Impala does not compute the number of rows for each partition for + Kudu tables. Therefore, you do not need to re-run the operation when + you see -1 in the <codeph># Rows</codeph> column of the output from + <codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for + all Kudu tables. + </p> + <p conref="../shared/impala_common.xml#common/related_info"/> <p>
