wuchong commented on code in PR #20696: URL: https://github.com/apache/flink/pull/20696#discussion_r969261642
########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** + +- To use Hive dialect, you have to add dependencies related to Hive. Please refer to [Hive dependencies]({{< ref "docs/connectors/table/hive/overview" >}}#dependencies) for how to add the dependencies. Review Comment: Please update the page of "Hive dependencies" to mention about switching the planner jar. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** + +- To use Hive dialect, you have to add dependencies related to Hive. Please refer to [Hive dependencies]({{< ref "docs/connectors/table/hive/overview" >}}#dependencies) for how to add the dependencies. +- Please make sure the current catalog is [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}). Otherwise, it will fall back to Flink's `default` dialect. +- In order to have better syntax and semantic compatibility, it’s highly recommended to load [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) and + place it first in the module list, so that Hive built-in functions can be picked up during function resolution. + Please refer [here]({{< ref "docs/dev/table/modules" >}}#how-to-load-unload-use-and-list-modules) for how to change resolution order. +- Hive dialect only supports 2-part identifiers, so you can't specify catalog for an identifier. +- While all Hive versions support the same syntax, whether a specific feature is available still depends on the + [Hive version]({{< ref "docs/connectors/table/hive/overview" >}}#supported-hive-versions) you use. For example, updating database + location is only supported in Hive-2.4.0 or later. +{{< /hint >}} + +### SQL Client + +SQL dialect can be specified via the `table.sql-dialect` property. +Therefore,you can set the dialect after the SQL Client has launched. + +```bash +Flink SQL> SET 'table.sql-dialect' = 'hive'; -- to use hive dialect Review Comment: Please both not quote, or add an inline note that the default dialect requires quotes and the Hive dialect shouldn't be quoted. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,164 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview Review Comment: Use dash-case instead of camelCase in URL. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by.md: ########## @@ -0,0 +1,98 @@ +--- +title: "Sort/Cluster/Distributed By" +weight: 2 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Sort/Cluster/Distributed by Clause + +## Sort By + +### Description + +Unlike [ORDER BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview" >}}#order-by-clause) which guarantees a total order of output, +`SORT BY` only guarantees the result rows with each partition is in the user specified order. +So when there's more than one partition, `SORT BY` may return result that's partially ordered. + +### Syntax + +```sql +colOrder: ( ASC | DESC ) +sortBy: SORT BY BY expression [ , ... ] Review Comment: 1. Remove the duplicated "BY" 2. Add the missing "colOrder" 3. Usually, syntax should be introduced from coarse to fine (query -> sortBy -> colOrder). ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,164 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Queries + +## Description + +Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). +The following lists some parts of HiveQL supported by the Hive dialect. + +- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) +- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) +- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) +- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) +- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}}) +- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}}) +- [Sub-Queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) +- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}) +- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}}) +- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}}) + +## Syntax + +The following section describes the overall query syntax. +The SELECT clause can be part of a query which also includes [common table expressions (CTE)]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}), set operations, and various other clauses. + +```sql +[WITH CommonTableExpression [ , ... ]] +SELECT [ALL | DISTINCT] select_expr [ , ... ] + FROM table_reference + [WHERE where_condition] + [GROUP BY col_list] + [ORDER BY col_list] + [CLUSTER BY col_list + | [DISTRIBUTE BY col_list] [SORT BY col_list] + ] + [LIMIT [offset,] rows] +``` +- The `SELECT` statement can be part of a [set]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) query or a [sub-query]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) of another query +- `CommonTableExpression` is a temporary result set derived from a query specified in a `WITH` clause +- `table_reference` indicates the input to the query. It can be a regular table, a view, a [join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) or a [sub-query]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}). +- Table names and column names are case-insensitive + +### WHERE Clause + +The `WHERE` condition is a boolean expression. Hive dialect supports a number of [operators and UDFS](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) Review Comment: ```suggestion The `WHERE` condition is a boolean expression. Hive dialect supports a number of [operators and UDFs](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/insert.md: ########## @@ -0,0 +1,213 @@ +--- +title: "INSERT Statements" +weight: 3 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# INSERT Statements + +## INSERT TABLE + +### Description + +The `INSERT TABLE` statement is used to insert rows into a table or overwrite the existing data in the table. The row to be inserted +can be specified by value expressions or result from query. + +### Syntax + +```sql +-- Stardard syntax +INSERT [OVERWRITE] TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +INSERT INTO TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; Review Comment: IIUC, we can merge INSERT INTO and INSERT OVERWRITE and partition inserts into one syntax, e.g.: ```sql INSERT { OVERWRITE | INTO } [ TABLE ] table_name [ PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS] ] [ ( column_name [, ...] ) ] { VALUES clause | SELECT clause } ``` Currently, it's not clear whether keyword "OVERWRITE" / "INTO" / "TABLE" is optional or not, what's the difference between standard syntax and dynamic partition inserts. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/insert.md: ########## @@ -0,0 +1,213 @@ +--- +title: "INSERT Statements" +weight: 3 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# INSERT Statements + +## INSERT TABLE + +### Description + +The `INSERT TABLE` statement is used to insert rows into a table or overwrite the existing data in the table. The row to be inserted +can be specified by value expressions or result from query. + +### Syntax + +```sql +-- Stardard syntax +INSERT [OVERWRITE] TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +INSERT INTO TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +-- Hive extension (multiple inserts): +FROM from_statement +INSERT [OVERWRITE] TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT [OVERWRITE] TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +FROM from_statement +INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT INTO TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; Review Comment: Separate "Multi Table Insert" into another section? ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/show.md: ########## @@ -0,0 +1,118 @@ +--- +title: "Show Statements" Review Comment: ```suggestion title: "SHOW Statements" ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/create.md: ########## @@ -0,0 +1,245 @@ +--- +title: "CREATE Statements" +weight: 2 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# CREATE Statements + +With Hive dialect, the following CREATE statements are supported for now: + +- CREATE DATABASE +- CREATE TABLE +- CREATE VIEW +- CREATE MARCO +- CREATE FUNCTION + +## CREATE DATABASE + +### Description + +`CREATE DATABASE` statement is used to create a database with the specified name. + +### Syntax + +```sql +CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name + [COMMENT database_comment] + [LOCATION hdfs_path] + [WITH DBPROPERTIES (property_name=property_value, ...)]; +``` + +### Examples + +```sql +CREATE DATABASE db1; +CREATE DATABASE IF NOT EXISTS db1 COMMENT 'db1' LOCATION '/user/hive/warehouse/db1' + WITH DBPROPERTIES ('name'='example-db'); +``` + + +## CREATE TABLE + +### Description + +`CREATE TABLE` statement is used to define a table in an existing database. + +### Syntax + +```sql +CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])] + [COMMENT table_comment] + [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] + [ + [ROW FORMAT row_format] + [STORED AS file_format] + ] + [LOCATION fs_path] + [TBLPROPERTIES (property_name=property_value, ...)] + [AS select_statment]; + +data_type + : primitive_type + | array_type + | map_type + | struct_type +primitive_type + : TINYINT + | SMALLINT + | INT + | BIGINT + | BOOLEAN + | FLOAT + | DOUBLE + | DOUBLE PRECISION + | STRING + | BINARY + | TIMESTAMP + | DECIMAL + | DECIMAL(precision, scale) + | DATE + | VARCHAR + | CHAR +array_type + : ARRAY < data_type > + +array_type + : ARRAY < data_type > +struct_type + : STRUCT < col_name : data_type [COMMENT col_comment], ...> +row_format: + : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] + [NULL DEFINED AS char] + | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, ...)] +file_format: + : SEQUENCEFILE + | TEXTFILE + | RCFILE + | ORC + | PARQUET + | AVRO + | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname +column_constraint: + : NOT NULL +table_constraint: + : [CONSTRAINT constraint_name] PRIMARY KEY (col_name, ...) +``` + +{{< hint warning >}} +**NOTE:** + +- Create temporary table is not supported yet. +{{< /hint >}} + +### Examples + +```sql +-- creaet non-partition table +CREATE TABLE t1(key string, value string); + +-- creaet partitioned table +CREATE TABLE pt1(key string, value string) PARTITIONED by (year int, month int); + +-- creaet table with specifc format +CREATE TABLE t1(key string, value string) stored as ORC; + +-- create table with specifc rowfromat +CREATE TABLE t1(m MAP<BIGINT, STRING>) + ROW FROMAT DELIMITED COLLECTION ITEMS TERMINATED BY ';' + MAP KEYS TERMINATED BY ':'; + +-- create table as select +CREATE TABLE t2 AS SELECT key, COUNT(1) FROM t1 GROUP BY key; +``` + +## CREATE VIEW + +### Description + +`CREATE VIEW` creates a view with the given name. +If no column names are supplied, the names of the view's columns will be derived automatically from the defining SELECT expression. +(If the SELECT contains un-aliased scalar expressions such as x+y, the resulting view column names will be generated in the form _C0, _C1, etc.) +When renaming columns, column comments can also optionally be supplied. (Comments are not automatically inherited from underlying columns.) + +Note that a view is a purely logical object with no associated storage. When a query references a view, the view's definition is evaluated in order to produce a set of rows for further processing by the query. + +### Syntax + +```sql +CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name, ...) ] + [COMMENT view_comment] + [TBLPROPERTIES (property_name = property_value, ...)] + AS SELECT ...; +``` + +### Examples + +```sql +CREATE VIEW IF NOT EXISTS v1 + (key COMMENT 'key') + COMMENT 'View for key=1' + AS SELECT key FROM src + WHERE key = '1'; +``` + +## CREATE MARCO + +### Description + +`CREATE TEMPORARY MACRO` statement creates a macro using the given optional list of columns as inputs to the expression. +Macros exists for the duration of the current session. + +### Syntax + +```sql +CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; +``` + +### Examples + +```sql +CREATE TEMPORARY MACRO fixed_number() 42; +CREATE TEMPORARY MACRO string_len_plus_two(x string) length(x) + 2; +CREATE TEMPORARY MACRO simple_add (x int, y int) x + y; +``` + +## CREATE FUNCTION + +### Description + +` CREATE FUNCTION` statement creates a function that is implemented by the class_name. + +### Syntax + +#### Create Temporary Function + +```sql +CREATE TEMPORARY FUNCTION function_name AS class_name [USING JAR 'file_uri']; +``` + +The function exists for the duration of the current session. + +#### Create Permanent Function + +```sql +CREATE FUNCTION [db_name.]function_name AS class_name + [USING JAR 'file_uri']; +``` +The function is registered to metastore and will exist in all session unless the function is dropped. + +### Parameter +- `[USING JAR 'file_uri']` + + User can use the clause to add Jar that contains the implementation of the function along with its dependencies while creating the function. + The `file_uri` can be on local file or distributed file system. + For remote jars, Flink will download the jars. When session exists, those downloaded jars will be removed. Review Comment: ```suggestion Flink will automatically download the jars for remote jars when the function is used in queries. The downloaded jars will be removed when the session exits. ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/insert.md: ########## @@ -0,0 +1,213 @@ +--- +title: "INSERT Statements" +weight: 3 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# INSERT Statements + +## INSERT TABLE + +### Description + +The `INSERT TABLE` statement is used to insert rows into a table or overwrite the existing data in the table. The row to be inserted +can be specified by value expressions or result from query. + +### Syntax + +```sql +-- Stardard syntax +INSERT [OVERWRITE] TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +INSERT INTO TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +-- Hive extension (multiple inserts): +FROM from_statement +INSERT [OVERWRITE] TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT [OVERWRITE] TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +FROM from_statement +INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT INTO TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +-- Hive extension (dynamic partition inserts): +INSERT [OVERWRITE] TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement FROM from_statement }; + +INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement FROM from_statement }; +``` + +### Parameters + +- `OVERWRITE` + + If specify `OVERWRITE`, it will overwrite any existing data in the table or partition. + +- `PARTITION ( ... )` + + An option to specify insert data into table's specific partitions. + If the `PARTITION` clause is specified, the table should be a partitioned table. + +- `VALUES ( value [, ..] ) [, ( ... ) ]` + + Specifies the values to be inserted explicitly. A common must be used to separate each value in the clause. + More than one set of values can be specified to insert multiple rows. + +- select_statement + + A statement for query. + See more details in [queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview" >}}). + +### Synopsis + +#### Multiple Inserts + +In the Hive extension syntax - multiple inserts, Flink will minimize the number of data scans requires. Flink can insert data into multiple +tables by scanning the input data just once. + +#### Dynamic Partition Inserts + +In the Hive extension syntax - dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the `PARTITION` clause with optional column values. +If all the partition columns' value are given, we call this a static partition, otherwise it is a dynamic partition. + +Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. + +The dynamic partition columns must be specified last among the columns in the `SELECT` statement and in the same order in which they appear in the `PARTITION()` clause. + +{{< hint warning >}} +**Note:** + +In Hive, by default, the user mush specify at least one static partition in case the user accidentally overwrites all partition, and user can +set the configuration `hive.exec.dynamic.partition.mode` to `nonstrict` to to allow all partitions to be dynamic. + +But in Flink's Hive dialect, it'll always be `nonstrict` mode which means all partitions are allowed to be dynamic. +{{< /hint >}} + +### Examples + +```sql +-- insert into table using values +INSERT INTO t1 VALUES ('k1', 'v1'), ('k2', 'v2'); + +-- insert overwrite +INSERT OVERWRITE t1 VALUES ('k1', 'v1'), ('k2', 'v2');; + +-- insert into table using select statement +INSERT INTO TABLE t1 SELECT * FROM t2; + +-- insert into partition +--- static partition +INSERT INTO t1 PARTITION (year = 2022, month = 12) SELECT value FROM t2; + +--- dynamic partition +INSERT INTO t1 PARTITION (year = 2022, month) SELECT month, value FROM t2; +INSERT INTO t1 PARTITION (year, month) SELECT 2022, month, value FROM t2; + +-- multi-insert statements +FROM (SELECT month, value from t1) + INSERT OVERWRITE TABLE t1_1 SELECT value WHERE month <= 6 + INSERT OVERWRITE TABLE t1_1 SELECT value WHERE month > 6; +``` + +## INSERT OVERWRITE DIRECTORY + +### Description + +Query results can be inserted into filesystem directories by using a slight variation of the syntax above: +```sql +-- Standard syntax: +INSERT OVERWRITE [LOCAL] DIRECTORY directory_path + [ROW FORMAT row_format] [STORED AS file_format] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +-- Hive extension (multiple inserts): +FROM from_statement +INSERT OVERWRITE [LOCAL] DIRECTORY directory1_path select_statement1 +[INSERT OVERWRITE [LOCAL] DIRECTORY directory2_path select_statement2] ... Review Comment: ditto. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** + +- To use Hive dialect, you have to add dependencies related to Hive. Please refer to [Hive dependencies]({{< ref "docs/connectors/table/hive/overview" >}}#dependencies) for how to add the dependencies. +- Please make sure the current catalog is [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}). Otherwise, it will fall back to Flink's `default` dialect. +- In order to have better syntax and semantic compatibility, it’s highly recommended to load [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) and + place it first in the module list, so that Hive built-in functions can be picked up during function resolution. + Please refer [here]({{< ref "docs/dev/table/modules" >}}#how-to-load-unload-use-and-list-modules) for how to change resolution order. Review Comment: We should also give an example for SQL Gateway in this page. Besides, SQL Gateway provides default configuration for Hive dialect, catalog, and module. Therefore, these actions are only needed for SQL CLI and Table API. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/insert.md: ########## @@ -0,0 +1,213 @@ +--- +title: "INSERT Statements" +weight: 3 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# INSERT Statements + +## INSERT TABLE + +### Description + +The `INSERT TABLE` statement is used to insert rows into a table or overwrite the existing data in the table. The row to be inserted +can be specified by value expressions or result from query. + +### Syntax + +```sql +-- Stardard syntax +INSERT [OVERWRITE] TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +INSERT INTO TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +-- Hive extension (multiple inserts): +FROM from_statement +INSERT [OVERWRITE] TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT [OVERWRITE] TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +FROM from_statement +INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT INTO TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +-- Hive extension (dynamic partition inserts): +INSERT [OVERWRITE] TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement FROM from_statement }; + +INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement FROM from_statement }; +``` + +### Parameters + +- `OVERWRITE` + + If specify `OVERWRITE`, it will overwrite any existing data in the table or partition. + +- `PARTITION ( ... )` + + An option to specify insert data into table's specific partitions. + If the `PARTITION` clause is specified, the table should be a partitioned table. + +- `VALUES ( value [, ..] ) [, ( ... ) ]` + + Specifies the values to be inserted explicitly. A common must be used to separate each value in the clause. + More than one set of values can be specified to insert multiple rows. + +- select_statement + + A statement for query. + See more details in [queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview" >}}). + +### Synopsis + +#### Multiple Inserts + +In the Hive extension syntax - multiple inserts, Flink will minimize the number of data scans requires. Flink can insert data into multiple +tables by scanning the input data just once. + +#### Dynamic Partition Inserts + +In the Hive extension syntax - dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the `PARTITION` clause with optional column values. +If all the partition columns' value are given, we call this a static partition, otherwise it is a dynamic partition. + +Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. + +The dynamic partition columns must be specified last among the columns in the `SELECT` statement and in the same order in which they appear in the `PARTITION()` clause. + +{{< hint warning >}} +**Note:** + +In Hive, by default, the user mush specify at least one static partition in case the user accidentally overwrites all partition, and user can Review Comment: ```suggestion In Hive, by default, users must specify at least one static partition in case of accidentally overwriting all partitions, and users can ``` ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte.md: ########## @@ -0,0 +1,67 @@ +--- +title: "CTE" +weight: 9 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Common Table Expression (CTE) + +## Description + +A Common Table Expression (CTE) is a temporary result set derived from a query specified in a `WITH` clause, which immediately precedes a `SELECT` +or `INSERT` keyword. The CTE is defined only with the execution scope of a single statement, and can be referred in the scope. + +## Syntax + +```sql +withClause: cteClause [ , ... ] +cteClause: cte_name AS (select statment) Review Comment: ```suggestion cteClause: cte_name AS (select statement) ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by.md: ########## @@ -0,0 +1,129 @@ +--- +title: "Group By" +weight: 3 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Group By Clause + +## Description + +The `Group by` clause is used to compute a single result from multiple input rows with given aggregation function. +Hive dialect also supports enhanced aggregation features to do multiple aggregations based on the same record by using +`ROLLUP`/`CUBE`/`GROUPING SETS`. + +## Syntax + +```sql +groupByClause: groupByClause-1 | groupByClause-2 +groupByClause-1: GROUP BY group_expression [ , ... ] [ WITH ROLLUP | WITH CUBE ] + +groupByClause-2: GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [ , ... ] ) } [ , ... ] +grouping_set: { expression | ( [ expression [ , ... ] ] ) } + +groupByQuery: SELECT expression [ , ... ] FROM src groupByClause? Review Comment: ```suggestion group_by_clause: group_by_clause_1 | group_by_clause_2 group_by_clause_1: GROUP BY group_expression [ , ... ] [ WITH ROLLUP | WITH CUBE ] group_by_clause_2: GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [ , ... ] ) } [ , ... ] grouping_set: { expression | ( [ expression [ , ... ] ] ) } ``` ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte.md: ########## @@ -0,0 +1,67 @@ +--- +title: "CTE" +weight: 9 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Common Table Expression (CTE) + +## Description + +A Common Table Expression (CTE) is a temporary result set derived from a query specified in a `WITH` clause, which immediately precedes a `SELECT` +or `INSERT` keyword. The CTE is defined only with the execution scope of a single statement, and can be referred in the scope. + +## Syntax + +```sql +withClause: cteClause [ , ... ] Review Comment: Missing "WITH" keyword? ```suggestion withClause: WITH cteClause [ , ... ] ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/insert.md: ########## @@ -0,0 +1,213 @@ +--- +title: "INSERT Statements" +weight: 3 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# INSERT Statements + +## INSERT TABLE + +### Description + +The `INSERT TABLE` statement is used to insert rows into a table or overwrite the existing data in the table. The row to be inserted +can be specified by value expressions or result from query. + +### Syntax + +```sql +-- Stardard syntax +INSERT [OVERWRITE] TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +INSERT INTO TABLE tablename1 + [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement1 FROM from_statement }; + +-- Hive extension (multiple inserts): +FROM from_statement +INSERT [OVERWRITE] TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT [OVERWRITE] TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +FROM from_statement +INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1, +INSERT INTO TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2 +[, ... ]; + +-- Hive extension (dynamic partition inserts): +INSERT [OVERWRITE] TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement FROM from_statement }; + +INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) + { VALUES ( value [, ..] ) [, ( ... ) ] | select_statement FROM from_statement }; +``` + +### Parameters + +- `OVERWRITE` + + If specify `OVERWRITE`, it will overwrite any existing data in the table or partition. + +- `PARTITION ( ... )` + + An option to specify insert data into table's specific partitions. + If the `PARTITION` clause is specified, the table should be a partitioned table. + +- `VALUES ( value [, ..] ) [, ( ... ) ]` + + Specifies the values to be inserted explicitly. A common must be used to separate each value in the clause. + More than one set of values can be specified to insert multiple rows. + +- select_statement + + A statement for query. + See more details in [queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview" >}}). + +### Synopsis + +#### Multiple Inserts + +In the Hive extension syntax - multiple inserts, Flink will minimize the number of data scans requires. Flink can insert data into multiple +tables by scanning the input data just once. + +#### Dynamic Partition Inserts + +In the Hive extension syntax - dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the `PARTITION` clause with optional column values. +If all the partition columns' value are given, we call this a static partition, otherwise it is a dynamic partition. + +Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. + +The dynamic partition columns must be specified last among the columns in the `SELECT` statement and in the same order in which they appear in the `PARTITION()` clause. + +{{< hint warning >}} +**Note:** + +In Hive, by default, the user mush specify at least one static partition in case the user accidentally overwrites all partition, and user can +set the configuration `hive.exec.dynamic.partition.mode` to `nonstrict` to to allow all partitions to be dynamic. + +But in Flink's Hive dialect, it'll always be `nonstrict` mode which means all partitions are allowed to be dynamic. +{{< /hint >}} + +### Examples + +```sql +-- insert into table using values +INSERT INTO t1 VALUES ('k1', 'v1'), ('k2', 'v2'); + +-- insert overwrite +INSERT OVERWRITE t1 VALUES ('k1', 'v1'), ('k2', 'v2');; + +-- insert into table using select statement +INSERT INTO TABLE t1 SELECT * FROM t2; + +-- insert into partition +--- static partition +INSERT INTO t1 PARTITION (year = 2022, month = 12) SELECT value FROM t2; + +--- dynamic partition +INSERT INTO t1 PARTITION (year = 2022, month) SELECT month, value FROM t2; +INSERT INTO t1 PARTITION (year, month) SELECT 2022, month, value FROM t2; + +-- multi-insert statements +FROM (SELECT month, value from t1) + INSERT OVERWRITE TABLE t1_1 SELECT value WHERE month <= 6 + INSERT OVERWRITE TABLE t1_1 SELECT value WHERE month > 6; Review Comment: insert into another table for multi inserts? ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** Review Comment: Add a note about the batch execution mode as well. The Hive dialect is mainly used in batch mode. Some (or many?) of the Hive syntax doesn't support executing in streaming mode. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform.md: ########## @@ -0,0 +1,130 @@ +--- +title: "Transform Clause" +weight: 10 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Transform Clause + +## Description + +The `TRANSFORM` clause allows user to transform inputs using user-specified command or script. + +## Syntax + +```sql +query: + SELECT TRANSFORM ( expression [ , ... ] ) + [ inRowFormat ] + [ inRecordWriter ] + USING command_or_script + [ AS colName [ colType ] [ , ... ] ] + [ outRowFormat ] + [ outRecordReader ] + +rowFormat + : ROW FORMAT + (DELIMITED [FIELDS TERMINATED BY char] + [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] + [ESCAPED BY char] + [LINES SEPARATED BY char] + | + SERDE serde_name [WITH SERDEPROPERTIES + property_name=property_value, + property_name=property_value, ...]) + +outRowFormat : rowFormat +inRowFormat : rowFormat +outRecordReader : RECORDREADER className +inRecordWriter: RECORDWRITER record_write_class +``` + +{{< hint warning >}} +**Note:** + +- `MAP ..` and `REDUCE ..` are syntactic transformations of `SELECT TRANSFORM ( ... )` in Hive dialect for such query. + So you can use `MAP` / `REDUCE` to replace `SELECT TRANSFORM`. + {{< /hint >}} + +## Parameters + +- inRowFormat + + Specific use what row format to feed to input data into the running script. + By default, columns will be transformed to `STRING` and delimited by `TAB` before feeding to the user script; + Similarly, all `NULL` values will be converted to the literal string `\N` in order to differentiate `NULL` values from empty strings. + +- outRowFormat + + Specific use what row format to read the output from the running script. + By default, the standard output of the user script will be treated as TAB-separated `STRING` columns, + any cell containing only `\N` will be re-interpreted as a `NULL`, + and then the resulting `STRING` column will be cast to the data type specified in the table declaration in the usual way. + +- inRecordWriter + Specific use what writer(fully-qualified class name) to write the input data. The default is `org.apache.hadoop.hive.ql.exec.TextRecordWriter` + +- outRecordReader + Specific use what reader(fully-qualified class name) to read the output data. The default is `org.apache.hadoop.hive.ql.exec.TextRecordReader` + +- command_or_script + Specifies a command or a path to script to process data. + + {{< hint warning >}} + **Note:** + + Add a script file and then transform input using the script is not supported yet. + {{< /hint >}} + +- colType + Specific the output of the command/script should be cast what data type. By default, it will be `STRING` data type. + + +For the clause `( AS colName ( colType )? [, ... ] )?`, please be aware the following behavior: +- If the actual number of output columns is less than user specified output columns, additional user specified out columns will be filled with NULL. +- If the actual number of output columns is more than user specified output columns, the actual output will be truncated, keeping the corresponding columns. +- If user don't specific the clause `( AS colName ( colType )? [, ... ] )?`, the default output schema is `(key: STRING, value: STRING)`. + The key column contains all the characters before the first tab and the value column contains the remaining characters after the first tab. + If there is no tab, it will return the NULL value for the second column `value`. + Note that this is different from specifying AS `key, value` because in that case, `value` will only contain the portion between the first tab and the second tab if there are multiple tabs. + + +## Examples + +```sql +CREATE TABLE src(key string, value string); +-- transform using +SELECT TRANSFORM(key, value) using 'script' from t1; + +-- transform using with specific record writer and record reader +SELECT TRANSFORM(key, value) ROW FORMAT SERDE 'MySerDe' + WITH SERDEPROPERTIES ('p1'='v1','p2'='v2') + RECORDWRITER 'MyRecordWriter' + using 'script' + ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' + RECORDREADER 'MyRecordReader' from src; + +-- use keyword MAP instead of TRANSFORM +FROM src INSERT OVERWRITE TABLE dest1 MAP src.key, CAST(src.key / 10 AS INT) using 'script' as (c1, c2); + +-- specific the output of transform +SELECT TRANSFORM(column) USING 'script' AS c1, c2; +SELECT TRANSFORM(column) USING 'script' AS(c1 INT, c2 INT); Review Comment: Maybe we can show a dedicated script in the example to better understand the meaning of "script", e.g. `'/bin/cat'`. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/add.md: ########## @@ -0,0 +1,54 @@ +--- +title: "ADD Statements" +weight: 7 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# ADD Statements + +With Hive dialect, the following `ADD` statements are supported for now: +- ADD JAR + +## ADD JAR + +### Description + +`ADD JAR` statement is used to add user jars into the classpath. +Add multiple jars file in single `ADD JAR` statement is not supported. + + +### Syntax + +```sql +ADD JAR <jar_path>; +``` + +### Parameters + +- jar_path + + The path of the JAR file to be added. It could be either on a local file or distributed file system. + +### Examples + +```sql +ADD JAR t.jar; Review Comment: Use a remote jar as the example which is more widely used. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform.md: ########## @@ -0,0 +1,130 @@ +--- +title: "Transform Clause" +weight: 10 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Transform Clause + +## Description + +The `TRANSFORM` clause allows user to transform inputs using user-specified command or script. + +## Syntax + +```sql +query: + SELECT TRANSFORM ( expression [ , ... ] ) + [ inRowFormat ] + [ inRecordWriter ] + USING command_or_script + [ AS colName [ colType ] [ , ... ] ] + [ outRowFormat ] + [ outRecordReader ] + +rowFormat + : ROW FORMAT + (DELIMITED [FIELDS TERMINATED BY char] + [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] + [ESCAPED BY char] + [LINES SEPARATED BY char] + | + SERDE serde_name [WITH SERDEPROPERTIES + property_name=property_value, + property_name=property_value, ...]) + +outRowFormat : rowFormat +inRowFormat : rowFormat +outRecordReader : RECORDREADER className +inRecordWriter: RECORDWRITER record_write_class +``` + +{{< hint warning >}} +**Note:** + +- `MAP ..` and `REDUCE ..` are syntactic transformations of `SELECT TRANSFORM ( ... )` in Hive dialect for such query. + So you can use `MAP` / `REDUCE` to replace `SELECT TRANSFORM`. + {{< /hint >}} + +## Parameters + +- inRowFormat + + Specific use what row format to feed to input data into the running script. + By default, columns will be transformed to `STRING` and delimited by `TAB` before feeding to the user script; + Similarly, all `NULL` values will be converted to the literal string `\N` in order to differentiate `NULL` values from empty strings. + +- outRowFormat + + Specific use what row format to read the output from the running script. + By default, the standard output of the user script will be treated as TAB-separated `STRING` columns, + any cell containing only `\N` will be re-interpreted as a `NULL`, + and then the resulting `STRING` column will be cast to the data type specified in the table declaration in the usual way. + +- inRecordWriter + Specific use what writer(fully-qualified class name) to write the input data. The default is `org.apache.hadoop.hive.ql.exec.TextRecordWriter` + +- outRecordReader + Specific use what reader(fully-qualified class name) to read the output data. The default is `org.apache.hadoop.hive.ql.exec.TextRecordReader` + +- command_or_script + Specifies a command or a path to script to process data. Review Comment: Shall we mention that the script must be a local script and should be accessible on all hosts in the cluster? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
