[GitHub] [hudi] pratyakshsharma commented on a diff in pull request #9622: [HUDI-6851] Fixing Spark quick start guide

via GitHub Tue, 12 Sep 2023 19:48:29 -0700


pratyakshsharma commented on code in PR #9622:
URL: https://github.com/apache/hudi/pull/9622#discussion_r1323851562



##########
website/docs/sql_ddl.md:
##########
@@ -69,56 +74,110 @@ create table if not exists hudi_table1 (
   price double,
   ts bigint
 ) using hudi
-options (
+tblproperties (
   type = 'mor',
   primaryKey = 'id,name',
   preCombineField = 'ts' 
 );
 ```
 
 ### Partitioned Table
-Here is an example of creating a COW partitioned table.
+Here is an example of creating a COW partitioned key less table.
 ```sql
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
 dt string,
 hh string  
 ) using hudi
-options (
+tblproperties (
   type = 'cow',
   primaryKey = 'id'
  ) 
-partitioned by (dt, hh);
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table

Review Comment:
   nit: I guess we are using non-partitioned instead of un-partitioned at other 
places in docs. Let us change it to non-partitioned to keep it uniform?



##########
website/docs/sql_ddl.md:
##########
@@ -69,56 +74,110 @@ create table if not exists hudi_table1 (
   price double,
   ts bigint
 ) using hudi
-options (
+tblproperties (
   type = 'mor',
   primaryKey = 'id,name',
   preCombineField = 'ts' 
 );
 ```
 
 ### Partitioned Table
-Here is an example of creating a COW partitioned table.
+Here is an example of creating a COW partitioned key less table.
 ```sql
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
 dt string,
 hh string  
 ) using hudi
-options (
+tblproperties (
   type = 'cow',
   primaryKey = 'id'
  ) 
-partitioned by (dt, hh);
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 

Review Comment:
   nit: -- create a non-partitioned cow table with primaryKey 'uuid'.



##########
website/docs/sql_ddl.md:
##########
@@ -69,56 +74,110 @@ create table if not exists hudi_table1 (
   price double,
   ts bigint
 ) using hudi
-options (
+tblproperties (
   type = 'mor',
   primaryKey = 'id,name',
   preCombineField = 'ts' 
 );
 ```
 
 ### Partitioned Table
-Here is an example of creating a COW partitioned table.
+Here is an example of creating a COW partitioned key less table.
 ```sql
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
 dt string,
 hh string  
 ) using hudi
-options (
+tblproperties (
   type = 'cow',
   primaryKey = 'id'
  ) 
-partitioned by (dt, hh);
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'uuid'
+);
+```
+
+Here is an example of creating a MOR un-partitioned table.

Review Comment:
   ditto: unpartitioned -> non-partitioned



##########
website/docs/sql_ddl.md:
##########
@@ -69,56 +74,110 @@ create table if not exists hudi_table1 (
   price double,
   ts bigint
 ) using hudi
-options (
+tblproperties (
   type = 'mor',
   primaryKey = 'id,name',
   preCombineField = 'ts' 
 );
 ```
 
 ### Partitioned Table
-Here is an example of creating a COW partitioned table.
+Here is an example of creating a COW partitioned key less table.
 ```sql
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
 dt string,
 hh string  
 ) using hudi
-options (
+tblproperties (
   type = 'cow',
   primaryKey = 'id'
  ) 
-partitioned by (dt, hh);
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'uuid'
+);
+```
+
+Here is an example of creating a MOR un-partitioned table.
+
+```sql
+-- create a mor non-partitioned table with preCombineField provided
+create table hudi_mor_tbl (
+    id int,
+    name string,
+    price double,
+    ts bigint
+) using hudi
+tblproperties (
+    type = 'mor',
+    primaryKey = 'id',
+    preCombineField = 'ts'
+);
 ```
 
 ### Create Table for an External Hudi Table
-You can create an External table using the `location` statement. If an 
external location is not specified it is considered a managed table. You can 
read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+You can create an External table using the `location` statement. If an 
external location is not specified it is considered a managed table. 
+You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
 An external table is useful if you need to read/write to/from a pre-existing 
hudi table.
 
 ```sql
  create table h_p1 using hudi
  location '/path/to/hudi';
 ```
 
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.

Review Comment:
   nit: if existed -> if they exist.



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+### Table Type
+Here is an example of creating a COW table.
+
+```sql
+-- create a non-primary key (or key less) table
+create table if not exists hudi_table2(
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow'
+);
+```
+
+There could be datasets where primary key may not be feasible. For such 
use-cases, user don't need to elect any primary key for 
+the table and hudi will treat them as key less table. Users can still perform 
Merge Into, updates and deletes based on any random data column. 
+
+### Primary Key
+Here is an example of creating COW table with a primary key 'id'. For mutable 
datasets, it is recommended to set appropriate primary key. 
+
+```sql
+-- create a managed cow table
+create table if not exists hudi_table0 (
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+);
+```
+
+### PreCombineField
+Here is an example of creating an MOR external table. The **preCombineField** 
option
+is used to specify the preCombine field for merge. Generally 'event time' or 
some other similar column will be used for 
+ordering purpose. Hudi will be able to handle out of order data using the 
precombine field value.
+
+```sql
+-- create an external mor table

Review Comment:
   nit: the SQL does not have location specified. This is not external table.



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options

Review Comment:
   Should we change it to Table properties, since the SQL now has tblproperties 
instead?



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .

Review Comment:
   nit: "Set hudi config options" section. 
   
   Also better to add a link to this section here?



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+### Table Type
+Here is an example of creating a COW table.
+
+```sql
+-- create a non-primary key (or key less) table
+create table if not exists hudi_table2(
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow'
+);
+```
+
+There could be datasets where primary key may not be feasible. For such 
use-cases, user don't need to elect any primary key for 
+the table and hudi will treat them as key less table. Users can still perform 
Merge Into, updates and deletes based on any random data column. 
+
+### Primary Key
+Here is an example of creating COW table with a primary key 'id'. For mutable 
datasets, it is recommended to set appropriate primary key. 
+
+```sql
+-- create a managed cow table
+create table if not exists hudi_table0 (
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+);
+```
+
+### PreCombineField
+Here is an example of creating an MOR external table. The **preCombineField** 
option
+is used to specify the preCombine field for merge. Generally 'event time' or 
some other similar column will be used for 
+ordering purpose. Hudi will be able to handle out of order data using the 
precombine field value.
+
+```sql
+-- create an external mor table
+create table if not exists hudi_table1 (
+  id int, 
+  name string, 
+  price double,
+  ts bigint
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id,name',
+  preCombineField = 'ts' 
+);
+```
+
+### Partitioned Table
+Here is an example of creating a COW partitioned key less table.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+ ) 
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table

Review Comment:
   nit: Un-Partitioned -> Non-Partitioned?



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+### Table Type
+Here is an example of creating a COW table.
+
+```sql
+-- create a non-primary key (or key less) table
+create table if not exists hudi_table2(
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow'
+);
+```
+
+There could be datasets where primary key may not be feasible. For such 
use-cases, user don't need to elect any primary key for 
+the table and hudi will treat them as key less table. Users can still perform 
Merge Into, updates and deletes based on any random data column. 
+
+### Primary Key
+Here is an example of creating COW table with a primary key 'id'. For mutable 
datasets, it is recommended to set appropriate primary key. 
+
+```sql
+-- create a managed cow table
+create table if not exists hudi_table0 (
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+);
+```
+
+### PreCombineField
+Here is an example of creating an MOR external table. The **preCombineField** 
option
+is used to specify the preCombine field for merge. Generally 'event time' or 
some other similar column will be used for 
+ordering purpose. Hudi will be able to handle out of order data using the 
precombine field value.
+
+```sql
+-- create an external mor table
+create table if not exists hudi_table1 (
+  id int, 
+  name string, 
+  price double,
+  ts bigint
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id,name',
+  preCombineField = 'ts' 
+);
+```
+
+### Partitioned Table
+Here is an example of creating a COW partitioned key less table.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+ ) 
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 

Review Comment:
   -- create a non-partitioned cow table with primaryKey 'uuid'.



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL

Review Comment:
   SQL DDL -> SQL DML



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+### Table Type
+Here is an example of creating a COW table.
+
+```sql
+-- create a non-primary key (or key less) table
+create table if not exists hudi_table2(
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow'
+);
+```
+
+There could be datasets where primary key may not be feasible. For such 
use-cases, user don't need to elect any primary key for 
+the table and hudi will treat them as key less table. Users can still perform 
Merge Into, updates and deletes based on any random data column. 
+
+### Primary Key
+Here is an example of creating COW table with a primary key 'id'. For mutable 
datasets, it is recommended to set appropriate primary key. 
+
+```sql
+-- create a managed cow table
+create table if not exists hudi_table0 (
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+);
+```
+
+### PreCombineField
+Here is an example of creating an MOR external table. The **preCombineField** 
option
+is used to specify the preCombine field for merge. Generally 'event time' or 
some other similar column will be used for 
+ordering purpose. Hudi will be able to handle out of order data using the 
precombine field value.
+
+```sql
+-- create an external mor table
+create table if not exists hudi_table1 (
+  id int, 
+  name string, 
+  price double,
+  ts bigint
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id,name',
+  preCombineField = 'ts' 
+);
+```
+
+### Partitioned Table
+Here is an example of creating a COW partitioned key less table.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+ ) 
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'uuid'
+);
+```
+
+Here is an example of creating a MOR un-partitioned table.
+
+```sql
+-- create a mor non-partitioned table with preCombineField provided
+create table hudi_mor_tbl (
+    id int,
+    name string,
+    price double,
+    ts bigint
+) using hudi
+tblproperties (
+    type = 'mor',
+    primaryKey = 'id',
+    preCombineField = 'ts'
+);
+```
+
+### Create Table for an External Hudi Table
+You can create an External table using the `location` statement. If an 
external location is not specified it is considered a managed table. 
+You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+An external table is useful if you need to read/write to/from a pre-existing 
hudi table.
+
+```sql
+ create table h_p1 using hudi
+ location '/path/to/hudi';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+### Create Table AS SELECT
+
+Hudi supports CTAS(Create table as select) on spark sql. <br/>
+**Note:** For better performance to load data to hudi table, CTAS uses **bulk 
insert** as the write operation.
+
+**Example CTAS command to create a non-partitioned COW key less table.**
+
+```sql 
+create table h3 using hudi
+tblproperties (type = 'cow')
+as
+select 1 as id, 'a1' as name, 10 as price;
+```
+
+**Example CTAS command to create a partitioned, primary keyed COW table.**
+
+```sql
+create table h2 using hudi
+tblproperties (type = 'cow', primaryKey = 'id')
+partitioned by (dt)
+as
+select 1 as id, 'a1' as name, 10 as price, 1000 as dt;
+```
+
+**Example CTAS command to load data from another table.**
+
+```sql
+# create managed parquet table 
+create table parquet_mngd using parquet location 
'file:///tmp/parquet_dataset/*.parquet';
+
+# CTAS by loading data into hudi table
+create table hudi_tbl using hudi location 'file:/tmp/hudi/hudi_tbl/' 
tblproperties ( 
+  type = 'cow', 
+  primaryKey = 'id', 
+  preCombineField = 'ts' 
+ ) 
+partitioned by (datestr) as select * from parquet_mngd;
+```
+
+### Set hoodie config options
+You can also set the config with table options when creating table which will 
work for
+the table scope only and override the config set by the SET command.
+```sql
+create table if not exists h3(
+  id bigint, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'id',
+  type = 'mor',
+  ${hoodie.config.key1} = '${hoodie.config.value2}',
+  ${hoodie.config.key2} = '${hoodie.config.value2}',
+  ....
+);
+
+e.g.
+create table if not exists h3(
+  id bigint, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'id',
+  type = 'mor',
+  hoodie.cleaner.fileversions.retained = '20',
+  hoodie.keep.max.commits = '20'
+);
+```
+
+## Spark Alter Table
+### Syntax
+```sql
+-- Alter table name
+ALTER TABLE oldTableName RENAME TO newTableName
+
+-- Alter table add columns
+ALTER TABLE tableIdentifier ADD COLUMNS(colAndType (,colAndType)*)
+
+-- Alter table column type
+ALTER TABLE tableIdentifier CHANGE COLUMN colName colName colType
+```
+
+:::note
+`ALTER TABLE ... RENAME TO ...` is not supported when using AWS Glue Data 
Catalog as hive metastore as Glue itself does 

Review Comment:
   nit: as hive metastore -> as metastore.



##########
website/docs/quick-start-guide.md:
##########
@@ -246,67 +246,86 @@ Spark SQL needs an explicit create table command.
 
 **Table Concepts**
 
-- Table types
+- **Table types**
 
   Both Hudi's table types, Copy-On-Write (COW) and Merge-On-Read (MOR), can be 
created using Spark SQL.
   While creating the table, table type can be specified using **type** option: 
**type = 'cow'** or **type = 'mor'**.
 
-- Partitioned & Non-Partitioned tables
+- **Partitioned & Non-Partitioned tables**
 
   Users can create a partitioned table or a non-partitioned table in Spark 
SQL. To create a partitioned table, one needs
   to use **partitioned by** statement to specify the partition columns to 
create a partitioned table. When there is
   no **partitioned by** statement with create table command, table is 
considered to be a non-partitioned table.
 
-- Managed & External tables
+- **Primary keyed table**
 
-  In general, Spark SQL supports two kinds of tables, namely managed and 
external. If one specifies a location using **
-  location** statement or use `create external table` to create table 
explicitly, it is an external table, else its
-  considered a managed table. You can read more about external vs managed
-  tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+  Optionally users can choose to create a Primary keyed table. When primary 
key is set for a given table, 
+  Hudi ensures uniqueness during updates and deletes. Each record is uniquely 
identified by the primary key configuration. 
+  If primary key is not set, Hudi treats it as key less table and every record 
ingested is treated as a new record even 
+  if contents match. 
 
-*Read more in the [table management](/docs/table_management) guide.*
+:::note
+1. Since Hudi 0.14.0, users can create key less table or primary keyed table 
as per necessity. If 'primaryKey' 
+option is ignored while creating the table, hudi will treat the table as a key 
less table. If user prefer to elect 
+primary keys for a given hudi table, they can do so by using 'primaryKey' 
option while creating the table in spark-sql. 
+4. `primaryKey`, `preCombineField`, and `type` are case-sensitive.
+5. `preCombineField` is required for MOR tables. Generally 'event time' or 
some other similar column will be used for
+   ordering purpose. Hudi will be able to handle out of order data using the 
preCombine field value.
+6. While setting `primaryKey`, `preCombineField`, `type` or other Hudi 
configs, `tblproperties` is preferred over `options`. 
+7. A new Hudi table created by Spark SQL will by default set 
`hoodie.datasource.write.hive_style_partitioning=true`.
+:::
 
 :::note
-1. Since Hudi 0.10.0, `primaryKey` is required. It aligns with Hudi DataSource 
writer’s and resolves behavioural
-   discrepancies reported in previous versions. Non-primary-key tables are no 
longer supported. Any Hudi table created
-   pre-0.10.0 without a `primaryKey` needs to be re-created with a 
`primaryKey` field with 0.10.0.
-2. `primaryKey`, `preCombineField`, and `type` are case-sensitive.
-3. `preCombineField` is required for MOR tables. 
-4. When set `primaryKey`, `preCombineField`, `type` or other Hudi configs, 
`tblproperties` is preferred over `options`. 
-5. A new Hudi table created by Spark SQL will by default set 
`hoodie.datasource.write.hive_style_partitioning=true`.
+For the purpose of quick start guide, we will go with one table type (cow), 
partitioned table and external tables. For more 
+options, please refer to [SQL DDL](/docs/sql_ddl) and DML reference guide.  

Review Comment:
   Let's add link to DML as well?



##########
website/docs/quick-start-guide.md:
##########
@@ -246,67 +246,86 @@ Spark SQL needs an explicit create table command.
 
 **Table Concepts**
 
-- Table types
+- **Table types**
 
   Both Hudi's table types, Copy-On-Write (COW) and Merge-On-Read (MOR), can be 
created using Spark SQL.
   While creating the table, table type can be specified using **type** option: 
**type = 'cow'** or **type = 'mor'**.
 
-- Partitioned & Non-Partitioned tables
+- **Partitioned & Non-Partitioned tables**
 
   Users can create a partitioned table or a non-partitioned table in Spark 
SQL. To create a partitioned table, one needs
   to use **partitioned by** statement to specify the partition columns to 
create a partitioned table. When there is
   no **partitioned by** statement with create table command, table is 
considered to be a non-partitioned table.
 
-- Managed & External tables
+- **Primary keyed table**
 
-  In general, Spark SQL supports two kinds of tables, namely managed and 
external. If one specifies a location using **
-  location** statement or use `create external table` to create table 
explicitly, it is an external table, else its
-  considered a managed table. You can read more about external vs managed
-  tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+  Optionally users can choose to create a Primary keyed table. When primary 
key is set for a given table, 
+  Hudi ensures uniqueness during updates and deletes. Each record is uniquely 
identified by the primary key configuration. 
+  If primary key is not set, Hudi treats it as key less table and every record 
ingested is treated as a new record even 
+  if contents match. 
 
-*Read more in the [table management](/docs/table_management) guide.*
+:::note
+1. Since Hudi 0.14.0, users can create key less table or primary keyed table 
as per necessity. If 'primaryKey' 
+option is ignored while creating the table, hudi will treat the table as a key 
less table. If user prefer to elect 
+primary keys for a given hudi table, they can do so by using 'primaryKey' 
option while creating the table in spark-sql. 
+4. `primaryKey`, `preCombineField`, and `type` are case-sensitive.

Review Comment:
   Numbering needs to be corrected.



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+### Table Type
+Here is an example of creating a COW table.
+
+```sql
+-- create a non-primary key (or key less) table
+create table if not exists hudi_table2(
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow'
+);
+```
+
+There could be datasets where primary key may not be feasible. For such 
use-cases, user don't need to elect any primary key for 
+the table and hudi will treat them as key less table. Users can still perform 
Merge Into, updates and deletes based on any random data column. 
+
+### Primary Key
+Here is an example of creating COW table with a primary key 'id'. For mutable 
datasets, it is recommended to set appropriate primary key. 
+
+```sql
+-- create a managed cow table
+create table if not exists hudi_table0 (
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+);
+```
+
+### PreCombineField
+Here is an example of creating an MOR external table. The **preCombineField** 
option
+is used to specify the preCombine field for merge. Generally 'event time' or 
some other similar column will be used for 
+ordering purpose. Hudi will be able to handle out of order data using the 
precombine field value.
+
+```sql
+-- create an external mor table
+create table if not exists hudi_table1 (
+  id int, 
+  name string, 
+  price double,
+  ts bigint
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id,name',
+  preCombineField = 'ts' 
+);
+```
+
+### Partitioned Table
+Here is an example of creating a COW partitioned key less table.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+ ) 
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'uuid'
+);
+```
+
+Here is an example of creating a MOR un-partitioned table.
+
+```sql
+-- create a mor non-partitioned table with preCombineField provided
+create table hudi_mor_tbl (
+    id int,
+    name string,
+    price double,
+    ts bigint
+) using hudi
+tblproperties (
+    type = 'mor',
+    primaryKey = 'id',
+    preCombineField = 'ts'
+);
+```
+
+### Create Table for an External Hudi Table
+You can create an External table using the `location` statement. If an 
external location is not specified it is considered a managed table. 
+You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+An external table is useful if you need to read/write to/from a pre-existing 
hudi table.
+
+```sql
+ create table h_p1 using hudi
+ location '/path/to/hudi';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.

Review Comment:
   nit: if existed -> if they exist



##########
website/docs/quick-start-guide.md:
##########
@@ -246,67 +246,86 @@ Spark SQL needs an explicit create table command.
 
 **Table Concepts**
 
-- Table types
+- **Table types**
 
   Both Hudi's table types, Copy-On-Write (COW) and Merge-On-Read (MOR), can be 
created using Spark SQL.
   While creating the table, table type can be specified using **type** option: 
**type = 'cow'** or **type = 'mor'**.
 
-- Partitioned & Non-Partitioned tables
+- **Partitioned & Non-Partitioned tables**
 
   Users can create a partitioned table or a non-partitioned table in Spark 
SQL. To create a partitioned table, one needs
   to use **partitioned by** statement to specify the partition columns to 
create a partitioned table. When there is
   no **partitioned by** statement with create table command, table is 
considered to be a non-partitioned table.
 
-- Managed & External tables
+- **Primary keyed table**
 
-  In general, Spark SQL supports two kinds of tables, namely managed and 
external. If one specifies a location using **
-  location** statement or use `create external table` to create table 
explicitly, it is an external table, else its
-  considered a managed table. You can read more about external vs managed
-  tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+  Optionally users can choose to create a Primary keyed table. When primary 
key is set for a given table, 
+  Hudi ensures uniqueness during updates and deletes. Each record is uniquely 
identified by the primary key configuration. 
+  If primary key is not set, Hudi treats it as key less table and every record 
ingested is treated as a new record even 
+  if contents match. 
 
-*Read more in the [table management](/docs/table_management) guide.*
+:::note
+1. Since Hudi 0.14.0, users can create key less table or primary keyed table 
as per necessity. If 'primaryKey' 

Review Comment:
   Is this keyless option available for tables created via DeltaStreamer or 
spark datasource as well?



##########
website/docs/sql_dml.md:
##########
@@ -0,0 +1,347 @@
+---
+title: SQL DDL
+summary: "In this page, we introduce how to create tables with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The following are SparkSQL DDL actions available:
+
+## Spark Create Table
+:::note
+Only SparkSQL needs an explicit Create Table command. No Create Table command 
is required in Spark when using Scala or 
+Python. The first batch of a [Write](/docs/writing_data) to a table will 
create the table if it does not exist.
+:::
+
+### Options
+
+Users can set table options while creating a hudi table.
+
+| Parameter Name | Description                                                 
                                                                                
                                                                       | 
(Optional/Required) : Default Value |
+|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
+| primaryKey | The primary key names of the table, multiple fields separated 
by commas. When set, hudi will ensure uniqueness during updates and deletes. 
When this config is skipped, hudi treats the table as a key less table. | 
(Optional) : `id`|
+| type       | The type of table to create ([read more](/docs/table_types)). 
<br></br> `cow` = COPY-ON-WRITE, `mor` = MERGE-ON-READ.                         
                                                                     | 
(Optional) : `cow` |
+| preCombineField | The Pre-Combine field of the table. This field will be 
used in resolving the final version of the record when two versions are 
combined with merges or updates.                                                
    | (Optional) : `ts`|
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+### Table Type
+Here is an example of creating a COW table.
+
+```sql
+-- create a non-primary key (or key less) table
+create table if not exists hudi_table2(
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow'
+);
+```
+
+There could be datasets where primary key may not be feasible. For such 
use-cases, user don't need to elect any primary key for 
+the table and hudi will treat them as key less table. Users can still perform 
Merge Into, updates and deletes based on any random data column. 
+
+### Primary Key
+Here is an example of creating COW table with a primary key 'id'. For mutable 
datasets, it is recommended to set appropriate primary key. 
+
+```sql
+-- create a managed cow table
+create table if not exists hudi_table0 (
+  id int, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+);
+```
+
+### PreCombineField
+Here is an example of creating an MOR external table. The **preCombineField** 
option
+is used to specify the preCombine field for merge. Generally 'event time' or 
some other similar column will be used for 
+ordering purpose. Hudi will be able to handle out of order data using the 
precombine field value.
+
+```sql
+-- create an external mor table
+create table if not exists hudi_table1 (
+  id int, 
+  name string, 
+  price double,
+  ts bigint
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id,name',
+  preCombineField = 'ts' 
+);
+```
+
+### Partitioned Table
+Here is an example of creating a COW partitioned key less table.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id'
+ ) 
+partitioned by dt;
+```
+
+Here is an example of creating a MOR partitioned table with preCombine field.
+```sql
+create table if not exists hudi_table_p0 (
+id bigint,
+name string,
+dt string,
+hh string  
+) using hudi
+tblproperties (
+  type = 'mor',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ ) 
+partitioned by dt;
+```
+
+### Un-Partitioned Table
+Here is an example of creating a COW un-partitioned table.
+
+```sql
+-- create a cow table, with primaryKey 'uuid' and unpartitioned. 
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'uuid'
+);
+```
+
+Here is an example of creating a MOR un-partitioned table.
+
+```sql
+-- create a mor non-partitioned table with preCombineField provided
+create table hudi_mor_tbl (
+    id int,
+    name string,
+    price double,
+    ts bigint
+) using hudi
+tblproperties (
+    type = 'mor',
+    primaryKey = 'id',
+    preCombineField = 'ts'
+);
+```
+
+### Create Table for an External Hudi Table
+You can create an External table using the `location` statement. If an 
external location is not specified it is considered a managed table. 
+You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+An external table is useful if you need to read/write to/from a pre-existing 
hudi table.
+
+```sql
+ create table h_p1 using hudi
+ location '/path/to/hudi';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+### Create Table AS SELECT
+
+Hudi supports CTAS(Create table as select) on spark sql. <br/>
+**Note:** For better performance to load data to hudi table, CTAS uses **bulk 
insert** as the write operation.
+
+**Example CTAS command to create a non-partitioned COW key less table.**
+
+```sql 
+create table h3 using hudi
+tblproperties (type = 'cow')
+as
+select 1 as id, 'a1' as name, 10 as price;
+```
+
+**Example CTAS command to create a partitioned, primary keyed COW table.**
+
+```sql
+create table h2 using hudi
+tblproperties (type = 'cow', primaryKey = 'id')
+partitioned by (dt)
+as
+select 1 as id, 'a1' as name, 10 as price, 1000 as dt;
+```
+
+**Example CTAS command to load data from another table.**
+
+```sql
+# create managed parquet table 
+create table parquet_mngd using parquet location 
'file:///tmp/parquet_dataset/*.parquet';
+
+# CTAS by loading data into hudi table
+create table hudi_tbl using hudi location 'file:/tmp/hudi/hudi_tbl/' 
tblproperties ( 
+  type = 'cow', 
+  primaryKey = 'id', 
+  preCombineField = 'ts' 
+ ) 
+partitioned by (datestr) as select * from parquet_mngd;
+```
+
+### Set hoodie config options
+You can also set the config with table options when creating table which will 
work for
+the table scope only and override the config set by the SET command.
+```sql
+create table if not exists h3(
+  id bigint, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'id',
+  type = 'mor',
+  ${hoodie.config.key1} = '${hoodie.config.value2}',
+  ${hoodie.config.key2} = '${hoodie.config.value2}',
+  ....
+);
+
+e.g.
+create table if not exists h3(
+  id bigint, 
+  name string, 
+  price double
+) using hudi
+tblproperties (
+  primaryKey = 'id',
+  type = 'mor',
+  hoodie.cleaner.fileversions.retained = '20',
+  hoodie.keep.max.commits = '20'
+);
+```
+
+## Spark Alter Table
+### Syntax
+```sql
+-- Alter table name
+ALTER TABLE oldTableName RENAME TO newTableName
+
+-- Alter table add columns
+ALTER TABLE tableIdentifier ADD COLUMNS(colAndType (,colAndType)*)
+
+-- Alter table column type
+ALTER TABLE tableIdentifier CHANGE COLUMN colName colName colType
+```
+
+:::note
+`ALTER TABLE ... RENAME TO ...` is not supported when using AWS Glue Data 
Catalog as hive metastore as Glue itself does 
+not support table renames.
+:::
+
+### Examples
+```sql
+alter table h0 rename to h0_1;
+
+alter table h0_1 add columns(ext0 string);
+
+alter table h0_1 change column id id bigint;
+```
+### Alter hoodie config options
+You can also alter the write config for a table by the **ALTER 
SERDEPROPERTIES**

Review Comment:
   nit: ALTER SERDEPROPERTIES -> ALTER TABLE



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] pratyakshsharma commented on a diff in pull request #9622: [HUDI-6851] Fixing Spark quick start guide

Reply via email to