This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 3378f4f699 [HUDI-3997] update 0.11.0 docs (#5480)
3378f4f699 is described below
commit 3378f4f699ff68ce9e7cd8a9bffc42d04e8db4d5
Author: Raymond Xu <[email protected]>
AuthorDate: Sun May 1 03:22:42 2022 -0700
[HUDI-3997] update 0.11.0 docs (#5480)
---
website/docs/flink-quick-start-guide.md | 15 +-
website/docs/gcp_bigquery.md | 2 +-
website/docs/quick-start-guide.md | 319 +++++++++++----------
website/docs/table_management.md | 27 +-
website/releases/release-0.11.0.md | 24 +-
.../version-0.11.0/flink-quick-start-guide.md | 15 +-
.../versioned_docs/version-0.11.0/gcp_bigquery.md | 2 +-
.../version-0.11.0/quick-start-guide.md | 319 +++++++++++----------
.../version-0.11.0/table_management.md | 27 +-
9 files changed, 416 insertions(+), 334 deletions(-)
diff --git a/website/docs/flink-quick-start-guide.md
b/website/docs/flink-quick-start-guide.md
index daec4ba0b5..e9ca0c3df5 100644
--- a/website/docs/flink-quick-start-guide.md
+++ b/website/docs/flink-quick-start-guide.md
@@ -4,8 +4,8 @@ toc: true
last_modified_at: 2020-08-12T15:19:57+08:00
---
-This guide provides an instruction for Flink Hudi integration. We can feel the
unique charm of how Flink brings in the power of streaming into Hudi.
-Reading this guide, you can quickly start using Flink on Hudi, learn different
modes for reading/writing Hudi by Flink:
+This page introduces Flink-Hudi integration. We can feel the unique charm of
how Flink brings in the power of streaming into Hudi.
+This guide helps you quickly start using Flink on Hudi, and learn different
modes for reading/writing Hudi by Flink:
- **Quick Start** : Read [Quick Start](#quick-start) to get started quickly
Flink sql client to write to(read from) Hudi.
- **Configuration** : For [Global
Configuration](flink_configuration#global-configurations), sets up through
`$FLINK_HOME/conf/flink-conf.yaml`. For per job configuration, sets up through
[Table Option](flink_configuration#table-options).
@@ -23,8 +23,15 @@ We use the [Flink Sql
Client](https://ci.apache.org/projects/flink/flink-docs-re
quick start tool for SQL users.
#### Step.1 download Flink jar
-Hudi works with Flink-1.13.x version. You can follow instructions
[here](https://flink.apache.org/downloads) for setting up Flink.
-The hudi-flink-bundle jar is archived with scala 2.11, so it’s recommended to
use flink 1.13.x bundled with scala 2.11.
+
+Hudi works with both Flink 1.13 and Flink 1.14. You can follow the
+instructions [here](https://flink.apache.org/downloads) for setting up Flink.
Then choose the desired Hudi-Flink bundle
+jar to work with different Flink and Scala versions:
+
+- `hudi-flink1.13-bundle_2.11`
+- `hudi-flink1.13-bundle_2.12`
+- `hudi-flink1.14-bundle_2.11`
+- `hudi-flink1.14-bundle_2.12`
#### Step.2 start Flink cluster
Start a standalone Flink cluster within hadoop environment.
diff --git a/website/docs/gcp_bigquery.md b/website/docs/gcp_bigquery.md
index 93e4505f76..8583182042 100644
--- a/website/docs/gcp_bigquery.md
+++ b/website/docs/gcp_bigquery.md
@@ -1,5 +1,5 @@
---
-title: Google Cloud BigQuery
+title: Google BigQuery
keywords: [ hudi, gcp, bigquery ]
summary: Introduce BigQuery integration in Hudi.
---
diff --git a/website/docs/quick-start-guide.md
b/website/docs/quick-start-guide.md
index 8f77a34dd7..9b92093880 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -7,135 +7,122 @@ last_modified_at: 2019-12-30T15:59:57-04:00
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide provides a quick peek at Hudi's capabilities using spark-shell.
Using Spark datasources, we will walk through
-code snippets that allows you to insert and update a Hudi table of default
table type:
-[Copy on Write](/docs/concepts#copy-on-write-table).
-After each write operation we will also show how to read the data both
snapshot and incrementally.
+This guide provides a quick peek at Hudi's capabilities using spark-shell.
Using Spark datasources, we will walk through
+code snippets that allows you to insert and update a Hudi table of default
table type:
+[Copy on Write](/docs/table_types#copy-on-write-table). After each write
operation we will also show how to read the
+data both snapshot and incrementally.
## Setup
-Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions
[here](https://spark.apache.org/downloads) for setting up spark.
+Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions
[here](https://spark.apache.org/downloads) for setting up Spark.
**Spark 3 Support Matrix**
-| Hudi | Supported Spark 3 version |
-|:----------------|:------------------------------|
-| 0.11.0 | 3.2.x (default build, Spark bundle only), 3.1.x |
-| 0.10.0 | 3.1.x (default build), 3.0.x |
-| 0.7.0 - 0.9.0 | 3.0.x |
-| 0.6.0 and prior | not supported |
+| Hudi | Supported Spark 3 version |
+|:----------------|:------------------------------------------------|
+| 0.11.0 | 3.2.x (default build, Spark bundle only), 3.1.x |
+| 0.10.0 | 3.1.x (default build), 3.0.x |
+| 0.7.0 - 0.9.0 | 3.0.x |
+| 0.6.0 and prior | not supported |
-*The "default build" Spark version indicates that it is used to build the
`hudi-spark3-bundle`.*
+The *default build* Spark version indicates that it is used to build the
`hudi-spark3-bundle`.
-As of 0.9.0 release, Spark SQL DML support has been added and is experimental.
-
-In 0.11.0 release, we add support for Spark 3.2.x and continue the support for
Spark 3.1.x and Spark 2.4.x. We officially
-do not provide the support for Spark 3.0.x any more. To make it easier for
the users to pick the right Hudi Spark bundle
-in their deployment, we make the following adjustment to the naming of the
bundles:
-
-- For each supported Spark minor version, there is a corresponding Hudi Spark
bundle with the major and minor version
-in the naming, i.e., `hudi-spark3.2-bundle`, `hudi-spark3.1-bundle`, and
`hudi-spark2.4-bundle`.
-- We encourage users to migrate to using the new bundles above. We keep the
bundles with the legacy naming in this
-release, i.e., `hudi-spark3-bundle` targeting at Spark 3.2.x, the latest Spark
3 version, and `hudi-spark-bundle` for
-Spark 2.4.x.
+:::note
+In 0.11.0, there are changes on using Spark bundles, please refer
+to [0.11.0 release
notes](https://hudi.apache.org/releases/release-0.11.0/#spark-versions-and-bundles)
for detailed
+instructions.
+:::
<Tabs
defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
<TabItem value="scala">
-From the extracted directory run spark-shell with Hudi as:
+From the extracted directory run spark-shell with Hudi:
-```scala
-// spark-shell for spark 3.2
+```shell
+# Spark 3.2
spark-shell \
--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
-
-// spark-shell for spark 3.1
+```
+```shell
+# Spark 3.1
spark-shell \
--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-// spark-shell for spark 2.4 with scala 2.12
-spark-shell \
- --packages org.apache.hudi:hudi-spark2.4-bundle_2.12:0.11.0 \
- --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-// spark-shell for spark 2.4 with scala 2.11
+```
+```shell
+# Spark 2.4
spark-shell \
--packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
+</TabItem>
+
+<TabItem value="python">
+From the extracted directory run pyspark with Hudi:
+
+```shell
+# Spark 3.2
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
+```
+```shell
+# Spark 3.1
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+```
+```shell
+# Spark 2.4
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+```
</TabItem>
+
<TabItem value="sparksql">
Hudi support using Spark SQL to write and read data with the
**HoodieSparkSessionExtension** sql extension.
-From the extracted directory run Spark SQL with Hudi as:
+From the extracted directory run Spark SQL with Hudi:
```shell
-# Spark SQL for spark 3.2
+# Spark 3.2
spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
---conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
---conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 3.1
+--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
+--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
+```
+```shell
+# Spark 3.1
spark-sql --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 2.4 with scala 2.11
+```
+```shell
+# Spark 2.4
spark-sql --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 2.4 with scala 2.12
-spark-sql \
- --packages org.apache.hudi:hudi-spark2.4-bundle_2.12:0.11.0 \
- --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
- --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
```
</TabItem>
-<TabItem value="python">
-From the extracted directory run pyspark with Hudi as:
-
-```python
-# pyspark
-export PYSPARK_PYTHON=$(which python3)
-
-# for spark3.2
-pyspark
---packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
---conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
-
-# for spark3.1
-pyspark
---packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-# for spark2.4 with scala 2.12
-pyspark
---packages org.apache.hudi:hudi-spark2.4-bundle_2.12:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-# for spark2.4 with scala 2.11
-pyspark
---packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-```
-
-</TabItem>
-</Tabs>
+</Tabs
+>
:::note Please note the following
<ul>
@@ -152,7 +139,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-]}>
+]}
+>
+
<TabItem value="scala">
```scala
@@ -180,38 +169,15 @@ dataGen =
sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
:::tip
The
[DataGenerator](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L51)
can generate sample inserts and updates based on the the sample trip schema
[here](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58)
:::
-## Spark SQL Type Support
-
-| Spark | Hudi | Notes |
-|-----------------|--------------|---------------|
-| boolean | boolean | |
-| byte | int | |
-| short | int | |
-| integer | int | |
-| long | long | |
-| date | date | |
-| timestamp | timestamp | |
-| float | float | |
-| double | double | |
-| string | string | |
-| decimal | decimal | |
-| binary | bytes | |
-| array | array | |
-| map | map | |
-| struct | struct | |
-| char | | not supported |
-| varchar | | not supported |
-| numeric | | not supported |
-| null | | not supported |
-| object | | not supported |
-
## Create Table
<Tabs
@@ -219,8 +185,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
<TabItem value="scala">
```scala
@@ -242,33 +209,41 @@ values={[
Spark SQL needs an explicit create table command.
**Table Concepts**
-- Table types:
- Both of Hudi's table types (Copy-On-Write (COW) and Merge-On-Read (MOR)) can
be created using Spark SQL.
- While creating the table, table type can be specified using **type** option.
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR
table.
+- Table types
+
+ Both Hudi's table types, Copy-On-Write (COW) and Merge-On-Read (MOR), can be
created using Spark SQL.
+ While creating the table, table type can be specified using **type** option:
**type = 'cow'** or **type = 'mor'**.
+
+- Partitioned & Non-Partitioned tables
-- Partitioned & Non-Partitioned table:
- Users can create a partitioned table or a non-partitioned table in Spark SQL.
- To create a partitioned table, one needs to use **partitioned by** statement
to specify the partition columns to create a partitioned table.
- When there is no **partitioned by** statement with create table command,
table is considered to be a non-partitioned table.
+ Users can create a partitioned table or a non-partitioned table in Spark
SQL. To create a partitioned table, one needs
+ to use **partitioned by** statement to specify the partition columns to
create a partitioned table. When there is
+ no **partitioned by** statement with create table command, table is
considered to be a non-partitioned table.
-- Managed & External table:
- In general, Spark SQL supports two kinds of tables, namely managed and
external.
- If one specifies a location using **location** statement or use `create
external table` to create table explicitly, it is an external table, else its
considered a managed table.
- You can read more about external vs managed tables
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+- Managed & External tables
+
+ In general, Spark SQL supports two kinds of tables, namely managed and
external. If one specifies a location using **
+ location** statement or use `create external table` to create table
explicitly, it is an external table, else its
+ considered a managed table. You can read more about external vs managed
+ tables
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+*Read more in the [table management](/docs/table_management) guide.*
:::note
-1. Since hudi 0.10.0, `primaryKey` is required to specify. It can align with
Hudi datasource writer’s and resolve many behavioural discrepancies reported in
previous versions.
- Non-primaryKey tables are no longer supported. Any hudi table created pre
0.10.0 without a `primaryKey` needs to be recreated with a `primaryKey` field
with 0.10.0.
- Same as `hoodie.datasource.write.recordkey.field`, hudi use `uuid` as the
default primaryKey. So if you want to use `uuid` as your table's `primaryKey`,
you can omit the `primaryKey` config in `tblproperties`.
-2. `primaryKey`, `preCombineField`, `type` is case sensitive.
-3. To specify `primaryKey`, `preCombineField`, `type` or other hudi configs,
`tblproperties` is the preferred way than `options`. Spark SQL syntax is
detailed here.
-4. A new hudi table created by Spark SQL will set
`hoodie.table.keygenerator.class` as
`org.apache.hudi.keygen.ComplexKeyGenerator`, and
-`hoodie.datasource.write.hive_style_partitioning` as `true` by default.
+1. Since Hudi 0.10.0, `primaryKey` is required. It aligns with Hudi DataSource
writer’s and resolves behavioural
+ discrepancies reported in previous versions. Non-primary-key tables are no
longer supported. Any Hudi table created
+ pre-0.10.0 without a `primaryKey` needs to be re-created with a
`primaryKey` field with 0.10.0.
+2. Similar to `hoodie.datasource.write.recordkey.field`, `uuid` is used as
primary key by default; if that's the case
+ for your table, you can skip setting `primaryKey` in `tblproperties`.
+3. `primaryKey`, `preCombineField`, and `type` are case-sensitive.
+4. `preCombineField` is required for MOR tables.
+5. When set `primaryKey`, `preCombineField`, `type` or other Hudi configs,
`tblproperties` is preferred over `options`.
+6. A new Hudi table created by Spark SQL will by default
+ set
`hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator` and
+ `hoodie.datasource.write.hive_style_partitioning=true`.
:::
-Let's go over some of the create table commands.
-
**Create a Non-Partitioned Table**
```sql
@@ -395,7 +370,9 @@ Users can set table properties while creating a hudi table.
Critical options are
To set any custom hudi config(like index type, max parquet size, etc), see the
"Set hudi config section" .
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Insert data
@@ -405,8 +382,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
Generate some new trips, load them into a DataFrame and write the DataFrame
into the Hudi table as below.
@@ -508,7 +487,9 @@ select id, name, price, ts from hudi_mor_tbl;
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
Checkout https://hudi.apache.org/blog/2021/02/13/hudi-key-generators for
various key generator options, like Timestamp based,
@@ -523,8 +504,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -561,7 +544,9 @@ spark.sql("select _hoodie_commit_time, _hoodie_record_key,
_hoodie_partition_pat
select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where fare >
20.0
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
:::info
Since 0.9.0 hudi has support a hudi built-in FileIndex: **HoodieFileIndex** to
query hudi table,
@@ -581,8 +566,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -664,7 +651,9 @@ select * from hudi_cow_pt_tbl timestamp as of '2022-03-08'
where id = 1;
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Update data
@@ -676,8 +665,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -794,7 +785,9 @@ denoted by the timestamp. Look for changes in
`_hoodie_commit_time`, `rider`, `d
:::
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Incremental query
@@ -808,7 +801,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-]}>
+]}
+>
+
<TabItem value="scala">
```scala
@@ -863,7 +858,9 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon,
begin_lat, ts from hu
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
:::info
This will give all changes that happened after the beginTime commit with the
filter of fare > 20.0. The unique thing about this
@@ -880,7 +877,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-]}>
+]}
+>
+
<TabItem value="scala">
```scala
@@ -922,7 +921,9 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon,
begin_lat, ts from hud
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Delete data {#deletes}
@@ -931,8 +932,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
Delete records for the HoodieKeys passed in.<br/>
@@ -1031,7 +1034,9 @@ spark.sql("select uuid, partitionpath from
hudi_trips_snapshot").count()
Only `Append` mode is supported for delete operation.
:::
</TabItem>
-</Tabs>
+
+</Tabs
+>
See the [deletion section](/docs/writing_data#deletes) of the writing data
page for more details.
@@ -1047,8 +1052,10 @@ steps in the upsert write path completely.
defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -1100,7 +1107,9 @@ insert overwrite table hudi_cow_pt_tbl select 10, 'a10',
1100, '2021-12-09', '10
insert overwrite hudi_cow_pt_tbl partition(dt = '2021-12-09', hh='12') select
13, 'a13', 1100;
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
## More Spark SQL Commands
@@ -1196,4 +1205,4 @@ Hudi tables can be queried from query engines like Hive,
Spark, Presto and much
[demo video](https://www.youtube.com/watch?v=VhNgUsxdrD0) that show cases all
of this on a docker based setup with all
dependent systems running locally. We recommend you replicate the same setup
and run the demo yourself, by following
steps [here](/docs/docker_demo) to get a taste for it. Also, if you are
looking for ways to migrate your existing data
-to Hudi, refer to [migration guide](/docs/migration_guide).
+to Hudi, refer to [migration guide](/docs/migration_guide).
diff --git a/website/docs/table_management.md b/website/docs/table_management.md
index 76c02edc6d..92cb6092aa 100644
--- a/website/docs/table_management.md
+++ b/website/docs/table_management.md
@@ -234,4 +234,29 @@ WITH (
### Alter Table
```sql
alter table h0 rename to h0_1;
-```
\ No newline at end of file
+```
+
+## Supported Types
+
+| Spark | Hudi | Notes |
+|-----------------|--------------|---------------|
+| boolean | boolean | |
+| byte | int | |
+| short | int | |
+| integer | int | |
+| long | long | |
+| date | date | |
+| timestamp | timestamp | |
+| float | float | |
+| double | double | |
+| string | string | |
+| decimal | decimal | |
+| binary | bytes | |
+| array | array | |
+| map | map | |
+| struct | struct | |
+| char | | not supported |
+| varchar | | not supported |
+| numeric | | not supported |
+| null | | not supported |
+| object | | not supported |
diff --git a/website/releases/release-0.11.0.md
b/website/releases/release-0.11.0.md
index 0662eeddee..6f35c99ded 100644
--- a/website/releases/release-0.11.0.md
+++ b/website/releases/release-0.11.0.md
@@ -92,14 +92,11 @@ time. Spark SQL DDL support (experimental) was added for
Spark 3.1.x and Spark 3
### Spark Versions and Bundles
-In 0.11.0,
-
-- Spark 3.2 support is added; users can use `hudi-spark3.2-bundle` or
`hudi-spark3-bundle` with Spark 3.2.
+- Spark 3.2 support is added; users who are on Spark 3.2 can use
`hudi-spark3.2-bundle` or `hudi-spark3-bundle` (legacy bundle name).
- Spark 3.1 will continue to be supported via `hudi-spark3.1-bundle`.
-- Spark 2.4 will continue to be supported via `hudi-spark2.4-bundle` or
`hudi-spark-bundle`.
-- Users are encouraged to use bundles with specific Spark version in the name:
`hudi-sparkX.Y-bundle`.
-- Spark bundle for 3.0.x is no longer officially supported. Users are
encouraged to upgrade to Spark 3.2 or 3.1.
-- `spark-avro` package is no longer required to work with Spark bundles.
+- Spark 2.4 will continue to be supported via `hudi-spark2.4-bundle` or
`hudi-spark-bundle` (legacy bundle name).
+
+*See the [migration guide](#migration-guide) for usage updates.*
### Slim Utilities Bundle
@@ -120,7 +117,7 @@ compatibility issues with other frameworks such as Spark.
default Flink state-based index, bucket index is in constant number of
buckets. Specify SQL option `index.type`
as `BUCKET` to enable it.
-### BigQuery Integration
+### Google BigQuery Integration
In 0.11.0, Hudi tables can be queried from BigQuery as external tables. Users
can
set `org.apache.hudi.gcp.bigquery.BigQuerySyncTool` as the sync tool
implementation for `HoodieDeltaStreamer` and make
@@ -170,7 +167,7 @@ added support for MOR tables.
*More info about this feature can be found in [Disaster
Recovery](/docs/disaster_recovery).*
-### Write Commit Callback for Pulsar
+### Pulsar Write Commit Callback
Hudi users can use `org.apache.hudi.callback.HoodieWriteCommitCallback` to
invoke callback function upon successful
commits. In 0.11.0, we add`HoodieWriteCommitPulsarCallback` in addition to the
existing HTTP callback and Kafka
@@ -184,10 +181,13 @@ tables. This is useful when tailing Hive tables in
`HoodieDeltaStreamer` instead
## Migration Guide
-### Bundle usage
+### Bundle usage updates
-As we relax the requirement of adding `spark-avro` package in 0.11.0 to work
with Spark and Utilities bundle,
-the option `--package org.apache.spark:spark-avro_2.1*:*` can be dropped.
+- Spark bundle for 3.0.x is no longer officially supported. Users are
encouraged to upgrade to Spark 3.2 or 3.1.
+- Users are encouraged to use bundles with specific Spark version in the name
(`hudi-sparkX.Y-bundle`) and move away
+ from the legacy bundles (`hudi-spark-bundle` and `hudi-spark3-bundle`).
+- Spark or Utilities bundle no longer requires additional `spark-avro` package
at runtime; the
+ option `--package org.apache.spark:spark-avro_2.1*:*` can be dropped.
### Configuration updates
diff --git a/website/versioned_docs/version-0.11.0/flink-quick-start-guide.md
b/website/versioned_docs/version-0.11.0/flink-quick-start-guide.md
index daec4ba0b5..e9ca0c3df5 100644
--- a/website/versioned_docs/version-0.11.0/flink-quick-start-guide.md
+++ b/website/versioned_docs/version-0.11.0/flink-quick-start-guide.md
@@ -4,8 +4,8 @@ toc: true
last_modified_at: 2020-08-12T15:19:57+08:00
---
-This guide provides an instruction for Flink Hudi integration. We can feel the
unique charm of how Flink brings in the power of streaming into Hudi.
-Reading this guide, you can quickly start using Flink on Hudi, learn different
modes for reading/writing Hudi by Flink:
+This page introduces Flink-Hudi integration. We can feel the unique charm of
how Flink brings in the power of streaming into Hudi.
+This guide helps you quickly start using Flink on Hudi, and learn different
modes for reading/writing Hudi by Flink:
- **Quick Start** : Read [Quick Start](#quick-start) to get started quickly
Flink sql client to write to(read from) Hudi.
- **Configuration** : For [Global
Configuration](flink_configuration#global-configurations), sets up through
`$FLINK_HOME/conf/flink-conf.yaml`. For per job configuration, sets up through
[Table Option](flink_configuration#table-options).
@@ -23,8 +23,15 @@ We use the [Flink Sql
Client](https://ci.apache.org/projects/flink/flink-docs-re
quick start tool for SQL users.
#### Step.1 download Flink jar
-Hudi works with Flink-1.13.x version. You can follow instructions
[here](https://flink.apache.org/downloads) for setting up Flink.
-The hudi-flink-bundle jar is archived with scala 2.11, so it’s recommended to
use flink 1.13.x bundled with scala 2.11.
+
+Hudi works with both Flink 1.13 and Flink 1.14. You can follow the
+instructions [here](https://flink.apache.org/downloads) for setting up Flink.
Then choose the desired Hudi-Flink bundle
+jar to work with different Flink and Scala versions:
+
+- `hudi-flink1.13-bundle_2.11`
+- `hudi-flink1.13-bundle_2.12`
+- `hudi-flink1.14-bundle_2.11`
+- `hudi-flink1.14-bundle_2.12`
#### Step.2 start Flink cluster
Start a standalone Flink cluster within hadoop environment.
diff --git a/website/versioned_docs/version-0.11.0/gcp_bigquery.md
b/website/versioned_docs/version-0.11.0/gcp_bigquery.md
index 93e4505f76..8583182042 100644
--- a/website/versioned_docs/version-0.11.0/gcp_bigquery.md
+++ b/website/versioned_docs/version-0.11.0/gcp_bigquery.md
@@ -1,5 +1,5 @@
---
-title: Google Cloud BigQuery
+title: Google BigQuery
keywords: [ hudi, gcp, bigquery ]
summary: Introduce BigQuery integration in Hudi.
---
diff --git a/website/versioned_docs/version-0.11.0/quick-start-guide.md
b/website/versioned_docs/version-0.11.0/quick-start-guide.md
index 8f77a34dd7..9b92093880 100644
--- a/website/versioned_docs/version-0.11.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.11.0/quick-start-guide.md
@@ -7,135 +7,122 @@ last_modified_at: 2019-12-30T15:59:57-04:00
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide provides a quick peek at Hudi's capabilities using spark-shell.
Using Spark datasources, we will walk through
-code snippets that allows you to insert and update a Hudi table of default
table type:
-[Copy on Write](/docs/concepts#copy-on-write-table).
-After each write operation we will also show how to read the data both
snapshot and incrementally.
+This guide provides a quick peek at Hudi's capabilities using spark-shell.
Using Spark datasources, we will walk through
+code snippets that allows you to insert and update a Hudi table of default
table type:
+[Copy on Write](/docs/table_types#copy-on-write-table). After each write
operation we will also show how to read the
+data both snapshot and incrementally.
## Setup
-Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions
[here](https://spark.apache.org/downloads) for setting up spark.
+Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions
[here](https://spark.apache.org/downloads) for setting up Spark.
**Spark 3 Support Matrix**
-| Hudi | Supported Spark 3 version |
-|:----------------|:------------------------------|
-| 0.11.0 | 3.2.x (default build, Spark bundle only), 3.1.x |
-| 0.10.0 | 3.1.x (default build), 3.0.x |
-| 0.7.0 - 0.9.0 | 3.0.x |
-| 0.6.0 and prior | not supported |
+| Hudi | Supported Spark 3 version |
+|:----------------|:------------------------------------------------|
+| 0.11.0 | 3.2.x (default build, Spark bundle only), 3.1.x |
+| 0.10.0 | 3.1.x (default build), 3.0.x |
+| 0.7.0 - 0.9.0 | 3.0.x |
+| 0.6.0 and prior | not supported |
-*The "default build" Spark version indicates that it is used to build the
`hudi-spark3-bundle`.*
+The *default build* Spark version indicates that it is used to build the
`hudi-spark3-bundle`.
-As of 0.9.0 release, Spark SQL DML support has been added and is experimental.
-
-In 0.11.0 release, we add support for Spark 3.2.x and continue the support for
Spark 3.1.x and Spark 2.4.x. We officially
-do not provide the support for Spark 3.0.x any more. To make it easier for
the users to pick the right Hudi Spark bundle
-in their deployment, we make the following adjustment to the naming of the
bundles:
-
-- For each supported Spark minor version, there is a corresponding Hudi Spark
bundle with the major and minor version
-in the naming, i.e., `hudi-spark3.2-bundle`, `hudi-spark3.1-bundle`, and
`hudi-spark2.4-bundle`.
-- We encourage users to migrate to using the new bundles above. We keep the
bundles with the legacy naming in this
-release, i.e., `hudi-spark3-bundle` targeting at Spark 3.2.x, the latest Spark
3 version, and `hudi-spark-bundle` for
-Spark 2.4.x.
+:::note
+In 0.11.0, there are changes on using Spark bundles, please refer
+to [0.11.0 release
notes](https://hudi.apache.org/releases/release-0.11.0/#spark-versions-and-bundles)
for detailed
+instructions.
+:::
<Tabs
defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
<TabItem value="scala">
-From the extracted directory run spark-shell with Hudi as:
+From the extracted directory run spark-shell with Hudi:
-```scala
-// spark-shell for spark 3.2
+```shell
+# Spark 3.2
spark-shell \
--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
-
-// spark-shell for spark 3.1
+```
+```shell
+# Spark 3.1
spark-shell \
--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-// spark-shell for spark 2.4 with scala 2.12
-spark-shell \
- --packages org.apache.hudi:hudi-spark2.4-bundle_2.12:0.11.0 \
- --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-// spark-shell for spark 2.4 with scala 2.11
+```
+```shell
+# Spark 2.4
spark-shell \
--packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
+</TabItem>
+
+<TabItem value="python">
+From the extracted directory run pyspark with Hudi:
+
+```shell
+# Spark 3.2
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
+```
+```shell
+# Spark 3.1
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+```
+```shell
+# Spark 2.4
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+```
</TabItem>
+
<TabItem value="sparksql">
Hudi support using Spark SQL to write and read data with the
**HoodieSparkSessionExtension** sql extension.
-From the extracted directory run Spark SQL with Hudi as:
+From the extracted directory run Spark SQL with Hudi:
```shell
-# Spark SQL for spark 3.2
+# Spark 3.2
spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
---conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
---conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 3.1
+--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
+--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
+```
+```shell
+# Spark 3.1
spark-sql --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 2.4 with scala 2.11
+```
+```shell
+# Spark 2.4
spark-sql --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 2.4 with scala 2.12
-spark-sql \
- --packages org.apache.hudi:hudi-spark2.4-bundle_2.12:0.11.0 \
- --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
- --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
```
</TabItem>
-<TabItem value="python">
-From the extracted directory run pyspark with Hudi as:
-
-```python
-# pyspark
-export PYSPARK_PYTHON=$(which python3)
-
-# for spark3.2
-pyspark
---packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
---conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
-
-# for spark3.1
-pyspark
---packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-# for spark2.4 with scala 2.12
-pyspark
---packages org.apache.hudi:hudi-spark2.4-bundle_2.12:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-
-# for spark2.4 with scala 2.11
-pyspark
---packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
-```
-
-</TabItem>
-</Tabs>
+</Tabs
+>
:::note Please note the following
<ul>
@@ -152,7 +139,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-]}>
+]}
+>
+
<TabItem value="scala">
```scala
@@ -180,38 +169,15 @@ dataGen =
sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
:::tip
The
[DataGenerator](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L51)
can generate sample inserts and updates based on the the sample trip schema
[here](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58)
:::
-## Spark SQL Type Support
-
-| Spark | Hudi | Notes |
-|-----------------|--------------|---------------|
-| boolean | boolean | |
-| byte | int | |
-| short | int | |
-| integer | int | |
-| long | long | |
-| date | date | |
-| timestamp | timestamp | |
-| float | float | |
-| double | double | |
-| string | string | |
-| decimal | decimal | |
-| binary | bytes | |
-| array | array | |
-| map | map | |
-| struct | struct | |
-| char | | not supported |
-| varchar | | not supported |
-| numeric | | not supported |
-| null | | not supported |
-| object | | not supported |
-
## Create Table
<Tabs
@@ -219,8 +185,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
<TabItem value="scala">
```scala
@@ -242,33 +209,41 @@ values={[
Spark SQL needs an explicit create table command.
**Table Concepts**
-- Table types:
- Both of Hudi's table types (Copy-On-Write (COW) and Merge-On-Read (MOR)) can
be created using Spark SQL.
- While creating the table, table type can be specified using **type** option.
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR
table.
+- Table types
+
+ Both Hudi's table types, Copy-On-Write (COW) and Merge-On-Read (MOR), can be
created using Spark SQL.
+ While creating the table, table type can be specified using **type** option:
**type = 'cow'** or **type = 'mor'**.
+
+- Partitioned & Non-Partitioned tables
-- Partitioned & Non-Partitioned table:
- Users can create a partitioned table or a non-partitioned table in Spark SQL.
- To create a partitioned table, one needs to use **partitioned by** statement
to specify the partition columns to create a partitioned table.
- When there is no **partitioned by** statement with create table command,
table is considered to be a non-partitioned table.
+ Users can create a partitioned table or a non-partitioned table in Spark
SQL. To create a partitioned table, one needs
+ to use **partitioned by** statement to specify the partition columns to
create a partitioned table. When there is
+ no **partitioned by** statement with create table command, table is
considered to be a non-partitioned table.
-- Managed & External table:
- In general, Spark SQL supports two kinds of tables, namely managed and
external.
- If one specifies a location using **location** statement or use `create
external table` to create table explicitly, it is an external table, else its
considered a managed table.
- You can read more about external vs managed tables
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+- Managed & External tables
+
+ In general, Spark SQL supports two kinds of tables, namely managed and
external. If one specifies a location using **
+ location** statement or use `create external table` to create table
explicitly, it is an external table, else its
+ considered a managed table. You can read more about external vs managed
+ tables
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+*Read more in the [table management](/docs/table_management) guide.*
:::note
-1. Since hudi 0.10.0, `primaryKey` is required to specify. It can align with
Hudi datasource writer’s and resolve many behavioural discrepancies reported in
previous versions.
- Non-primaryKey tables are no longer supported. Any hudi table created pre
0.10.0 without a `primaryKey` needs to be recreated with a `primaryKey` field
with 0.10.0.
- Same as `hoodie.datasource.write.recordkey.field`, hudi use `uuid` as the
default primaryKey. So if you want to use `uuid` as your table's `primaryKey`,
you can omit the `primaryKey` config in `tblproperties`.
-2. `primaryKey`, `preCombineField`, `type` is case sensitive.
-3. To specify `primaryKey`, `preCombineField`, `type` or other hudi configs,
`tblproperties` is the preferred way than `options`. Spark SQL syntax is
detailed here.
-4. A new hudi table created by Spark SQL will set
`hoodie.table.keygenerator.class` as
`org.apache.hudi.keygen.ComplexKeyGenerator`, and
-`hoodie.datasource.write.hive_style_partitioning` as `true` by default.
+1. Since Hudi 0.10.0, `primaryKey` is required. It aligns with Hudi DataSource
writer’s and resolves behavioural
+ discrepancies reported in previous versions. Non-primary-key tables are no
longer supported. Any Hudi table created
+ pre-0.10.0 without a `primaryKey` needs to be re-created with a
`primaryKey` field with 0.10.0.
+2. Similar to `hoodie.datasource.write.recordkey.field`, `uuid` is used as
primary key by default; if that's the case
+ for your table, you can skip setting `primaryKey` in `tblproperties`.
+3. `primaryKey`, `preCombineField`, and `type` are case-sensitive.
+4. `preCombineField` is required for MOR tables.
+5. When set `primaryKey`, `preCombineField`, `type` or other Hudi configs,
`tblproperties` is preferred over `options`.
+6. A new Hudi table created by Spark SQL will by default
+ set
`hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator` and
+ `hoodie.datasource.write.hive_style_partitioning=true`.
:::
-Let's go over some of the create table commands.
-
**Create a Non-Partitioned Table**
```sql
@@ -395,7 +370,9 @@ Users can set table properties while creating a hudi table.
Critical options are
To set any custom hudi config(like index type, max parquet size, etc), see the
"Set hudi config section" .
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Insert data
@@ -405,8 +382,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
Generate some new trips, load them into a DataFrame and write the DataFrame
into the Hudi table as below.
@@ -508,7 +487,9 @@ select id, name, price, ts from hudi_mor_tbl;
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
Checkout https://hudi.apache.org/blog/2021/02/13/hudi-key-generators for
various key generator options, like Timestamp based,
@@ -523,8 +504,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -561,7 +544,9 @@ spark.sql("select _hoodie_commit_time, _hoodie_record_key,
_hoodie_partition_pat
select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where fare >
20.0
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
:::info
Since 0.9.0 hudi has support a hudi built-in FileIndex: **HoodieFileIndex** to
query hudi table,
@@ -581,8 +566,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -664,7 +651,9 @@ select * from hudi_cow_pt_tbl timestamp as of '2022-03-08'
where id = 1;
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Update data
@@ -676,8 +665,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -794,7 +785,9 @@ denoted by the timestamp. Look for changes in
`_hoodie_commit_time`, `rider`, `d
:::
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Incremental query
@@ -808,7 +801,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-]}>
+]}
+>
+
<TabItem value="scala">
```scala
@@ -863,7 +858,9 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon,
begin_lat, ts from hu
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
:::info
This will give all changes that happened after the beginTime commit with the
filter of fare > 20.0. The unique thing about this
@@ -880,7 +877,9 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-]}>
+]}
+>
+
<TabItem value="scala">
```scala
@@ -922,7 +921,9 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon,
begin_lat, ts from hud
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
## Delete data {#deletes}
@@ -931,8 +932,10 @@ defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
{ label: 'Python', value: 'python', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
Delete records for the HoodieKeys passed in.<br/>
@@ -1031,7 +1034,9 @@ spark.sql("select uuid, partitionpath from
hudi_trips_snapshot").count()
Only `Append` mode is supported for delete operation.
:::
</TabItem>
-</Tabs>
+
+</Tabs
+>
See the [deletion section](/docs/writing_data#deletes) of the writing data
page for more details.
@@ -1047,8 +1052,10 @@ steps in the upsert write path completely.
defaultValue="scala"
values={[
{ label: 'Scala', value: 'scala', },
-{ label: 'SparkSQL', value: 'sparksql', },
-]}>
+{ label: 'Spark SQL', value: 'sparksql', },
+]}
+>
+
<TabItem value="scala">
```scala
@@ -1100,7 +1107,9 @@ insert overwrite table hudi_cow_pt_tbl select 10, 'a10',
1100, '2021-12-09', '10
insert overwrite hudi_cow_pt_tbl partition(dt = '2021-12-09', hh='12') select
13, 'a13', 1100;
```
</TabItem>
-</Tabs>
+
+</Tabs
+>
## More Spark SQL Commands
@@ -1196,4 +1205,4 @@ Hudi tables can be queried from query engines like Hive,
Spark, Presto and much
[demo video](https://www.youtube.com/watch?v=VhNgUsxdrD0) that show cases all
of this on a docker based setup with all
dependent systems running locally. We recommend you replicate the same setup
and run the demo yourself, by following
steps [here](/docs/docker_demo) to get a taste for it. Also, if you are
looking for ways to migrate your existing data
-to Hudi, refer to [migration guide](/docs/migration_guide).
+to Hudi, refer to [migration guide](/docs/migration_guide).
diff --git a/website/versioned_docs/version-0.11.0/table_management.md
b/website/versioned_docs/version-0.11.0/table_management.md
index 76c02edc6d..92cb6092aa 100644
--- a/website/versioned_docs/version-0.11.0/table_management.md
+++ b/website/versioned_docs/version-0.11.0/table_management.md
@@ -234,4 +234,29 @@ WITH (
### Alter Table
```sql
alter table h0 rename to h0_1;
-```
\ No newline at end of file
+```
+
+## Supported Types
+
+| Spark | Hudi | Notes |
+|-----------------|--------------|---------------|
+| boolean | boolean | |
+| byte | int | |
+| short | int | |
+| integer | int | |
+| long | long | |
+| date | date | |
+| timestamp | timestamp | |
+| float | float | |
+| double | double | |
+| string | string | |
+| decimal | decimal | |
+| binary | bytes | |
+| array | array | |
+| map | map | |
+| struct | struct | |
+| char | | not supported |
+| varchar | | not supported |
+| numeric | | not supported |
+| null | | not supported |
+| object | | not supported |