This is an automated email from the ASF dual-hosted git repository.
yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new fce7283372 [HUDI-4564] Update docs for Spark 3.3 support (#6359)
fce7283372 is described below
commit fce72833727b6e594657ff9a3fac432dffdad324
Author: Y Ethan Guo <[email protected]>
AuthorDate: Fri Aug 12 10:49:04 2022 -0700
[HUDI-4564] Update docs for Spark 3.3 support (#6359)
---
website/docs/quick-start-guide.md | 48 ++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 11 deletions(-)
diff --git a/website/docs/quick-start-guide.md
b/website/docs/quick-start-guide.md
index acd51ff8cc..b33aeb59f9 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -20,6 +20,7 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can
follow instructions [
| Hudi | Supported Spark 3 version |
|:----------------|:------------------------------------------------|
+| 0.12.x | 3.3.x (default build), 3.2.x, 3.1.x |
| 0.11.x | 3.2.x (default build, Spark bundle only), 3.1.x |
| 0.10.x | 3.1.x (default build), 3.0.x |
| 0.7.0 - 0.9.0 | 3.0.x |
@@ -28,6 +29,7 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can
follow instructions [
The *default build* Spark version indicates that it is used to build the
`hudi-spark3-bundle`.
:::note
+In 0.12.0, we introduce the experimental support for Spark 3.3.0.
In 0.11.0, there are changes on using Spark bundles, please refer
to [0.11.0 release
notes](https://hudi.apache.org/releases/release-0.11.0/#spark-versions-and-bundles)
for detailed
instructions.
@@ -45,10 +47,18 @@ values={[
From the extracted directory run spark-shell with Hudi:
+```shell
+# Spark 3.3
+spark-shell \
+ --packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.0 \
+ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+ --conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
+ --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
+```
```shell
# Spark 3.2
spark-shell \
- --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \
+ --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
@@ -56,13 +66,13 @@ spark-shell \
```shell
# Spark 3.1
spark-shell \
- --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1 \
+ --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
```shell
# Spark 2.4
spark-shell \
- --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
+ --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
</TabItem>
@@ -71,11 +81,20 @@ spark-shell \
From the extracted directory run pyspark with Hudi:
+```shell
+# Spark 3.3
+export PYSPARK_PYTHON=$(which python3)
+pyspark \
+--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
+--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
+```
```shell
# Spark 3.2
export PYSPARK_PYTHON=$(which python3)
pyspark \
---packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \
+--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
@@ -84,14 +103,14 @@ pyspark \
# Spark 3.1
export PYSPARK_PYTHON=$(which python3)
pyspark \
---packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1 \
+--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
```shell
# Spark 2.4
export PYSPARK_PYTHON=$(which python3)
pyspark \
---packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
+--packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
</TabItem>
@@ -101,22 +120,29 @@ pyspark \
Hudi support using Spark SQL to write and read data with the
**HoodieSparkSessionExtension** sql extension.
From the extracted directory run Spark SQL with Hudi:
+```shell
+# Spark 3.3
+spark-sql --packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.0 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
+--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
+```
```shell
# Spark 3.2
-spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \
+spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
```
```shell
# Spark 3.1
-spark-sql --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1 \
+spark-sql --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
```
```shell
# Spark 2.4
-spark-sql --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
+spark-sql --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.12.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
```
@@ -128,7 +154,7 @@ spark-sql --packages
org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
:::note Please note the following
<ul>
- <li> For Spark 3.2, the additional spark_catalog config is required:
+ <li> For Spark 3.2 and above, the additional spark_catalog config is
required:
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
</li>
<li> We have used hudi-spark-bundle built for scala 2.12 since the
spark-avro module used can also depend on 2.12. </li>
</ul>
@@ -1206,7 +1232,7 @@ more details please refer to [procedures](procedures).
You can also do the quickstart by [building hudi
yourself](https://github.com/apache/hudi#building-apache-hudi-from-source),
and using `--jars <path to
hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.1?-*.*.*-SNAPSHOT.jar`
in the spark-shell command above
-instead of `--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1`. Hudi
also supports scala 2.12. Refer [build with scala
2.12](https://github.com/apache/hudi#build-with-different-spark-versions)
+instead of `--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.12.0`. Hudi
also supports scala 2.12. Refer [build with scala
2.12](https://github.com/apache/hudi#build-with-different-spark-versions)
for more info.
Also, we used Spark here to show case the capabilities of Hudi. However, Hudi
can support multiple table types/query types and