This is an automated email from the ASF dual-hosted git repository.
sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2a9a03a43d5 [DOCS] improve spark quickstart, info about MT and async
services (#7549)
2a9a03a43d5 is described below
commit 2a9a03a43d5dd506b12879f5d1b4953d4961aed0
Author: kazdy <[email protected]>
AuthorDate: Wed Feb 8 01:14:33 2023 +0100
[DOCS] improve spark quickstart, info about MT and async services (#7549)
- improve spark quickstart, add info about table maintenance and async
services when metadata table is enabled
---
website/docs/quick-start-guide.md | 13 ++++++++-----
website/versioned_docs/version-0.10.0/quick-start-guide.md | 13 +++++++++++++
website/versioned_docs/version-0.10.1/quick-start-guide.md | 13 +++++++++++++
website/versioned_docs/version-0.11.0/quick-start-guide.md | 13 +++++++++++++
website/versioned_docs/version-0.11.1/quick-start-guide.md | 13 +++++++++++++
website/versioned_docs/version-0.12.0/quick-start-guide.md | 13 +++++++++++++
website/versioned_docs/version-0.12.1/quick-start-guide.md | 13 +++++++++++++
website/versioned_docs/version-0.12.2/quick-start-guide.md | 13 ++++++++-----
8 files changed, 94 insertions(+), 10 deletions(-)
diff --git a/website/docs/quick-start-guide.md
b/website/docs/quick-start-guide.md
index f456468898a..1d8f688a03d 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -1086,15 +1086,18 @@ Target table must exist before write.
:::
### Table maintenance
-Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning and compaction. There's no operational
overhead for the user.
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
-Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table in metastore aftear each streaming write.
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table
services with Metadata Table enabled you must use Optimistic Concurrency
Control to avoid the risk of data loss (even in single writer scenario). See
[Metadata Table deployment
considerations](/docs/next/metadata#deployment-considerations) for detailed
instructions.
-:::info
-If you're using Foreach or ForeachBatch streaming sink you must explicitly use
inline table services.
-Async table services are not supported.
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
:::
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.10.0/quick-start-guide.md
b/website/versioned_docs/version-0.10.0/quick-start-guide.md
index e98d34d9d3e..b014b2ffb07 100644
--- a/website/versioned_docs/version-0.10.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.10.0/quick-start-guide.md
@@ -982,6 +982,19 @@ Spark SQL can be used within ForeachBatch sink to do
INSERT, UPDATE, DELETE and
Target table must exist before write.
:::
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
+
+:::note
+In Hudi 0.10 Metadata Table was introduced. When using async table services
with Metadata Table enabled you must use Optimistic Concurrency Control to
avoid the risk of data loss (even in single writer scenario). See [Metadata
Table deployment
considerations](/docs/0.10.0/metadata#deployment-considerations) for detailed
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.10.1/quick-start-guide.md
b/website/versioned_docs/version-0.10.1/quick-start-guide.md
index e96400c9329..f61e920d4e6 100644
--- a/website/versioned_docs/version-0.10.1/quick-start-guide.md
+++ b/website/versioned_docs/version-0.10.1/quick-start-guide.md
@@ -999,6 +999,19 @@ Spark SQL can be used within ForeachBatch sink to do
INSERT, UPDATE, DELETE and
Target table must exist before write.
:::
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
+
+:::note
+In Hudi 0.10 Metadata Table was introduced. When using async table services
with Metadata Table enabled you must use Optimistic Concurrency Control to
avoid the risk of data loss (even in single writer scenario). See [Metadata
Table deployment
considerations](/docs/0.10.1/metadata#deployment-considerations) for detailed
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.11.0/quick-start-guide.md
b/website/versioned_docs/version-0.11.0/quick-start-guide.md
index 97305e0e0bf..7448fa67f5c 100644
--- a/website/versioned_docs/version-0.11.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.11.0/quick-start-guide.md
@@ -1043,6 +1043,19 @@ Spark SQL can be used within ForeachBatch sink to do
INSERT, UPDATE, DELETE and
Target table must exist before write.
:::
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table
services with Metadata Table enabled you must use Optimistic Concurrency
Control to avoid the risk of data loss (even in single writer scenario). See
[Metadata Table deployment
considerations](/docs/0.11.0/metadata#deployment-considerations) for detailed
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.11.1/quick-start-guide.md
b/website/versioned_docs/version-0.11.1/quick-start-guide.md
index 14708099898..4101157b5c8 100644
--- a/website/versioned_docs/version-0.11.1/quick-start-guide.md
+++ b/website/versioned_docs/version-0.11.1/quick-start-guide.md
@@ -1041,6 +1041,19 @@ Spark SQL can be used within ForeachBatch sink to do
INSERT, UPDATE, DELETE and
Target table must exist before write.
:::
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table
services with Metadata Table enabled you must use Optimistic Concurrency
Control to avoid the risk of data loss (even in single writer scenario). See
[Metadata Table deployment
considerations](/docs/0.11.1/metadata#deployment-considerations) for detailed
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.12.0/quick-start-guide.md
b/website/versioned_docs/version-0.12.0/quick-start-guide.md
index 35f6f52d382..0661e470899 100644
--- a/website/versioned_docs/version-0.12.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.0/quick-start-guide.md
@@ -1074,6 +1074,19 @@ Spark SQL can be used within ForeachBatch sink to do
INSERT, UPDATE, DELETE and
Target table must exist before write.
:::
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table
services with Metadata Table enabled you must use Optimistic Concurrency
Control to avoid the risk of data loss (even in single writer scenario). See
[Metadata Table deployment
considerations](/docs/0.12.0/metadata#deployment-considerations) for detailed
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.12.1/quick-start-guide.md
b/website/versioned_docs/version-0.12.1/quick-start-guide.md
index 4b2795b0d7c..44a7dbb9ed5 100644
--- a/website/versioned_docs/version-0.12.1/quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.1/quick-start-guide.md
@@ -1074,6 +1074,19 @@ Spark SQL can be used within ForeachBatch sink to do
INSERT, UPDATE, DELETE and
Target table must exist before write.
:::
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table
services with Metadata Table enabled you must use Optimistic Concurrency
Control to avoid the risk of data loss (even in single writer scenario). See
[Metadata Table deployment
considerations](/docs/0.12.0/metadata#deployment-considerations) for detailed
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.12.2/quick-start-guide.md
b/website/versioned_docs/version-0.12.2/quick-start-guide.md
index 2b980b22719..956ac8c978f 100644
--- a/website/versioned_docs/version-0.12.2/quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.2/quick-start-guide.md
@@ -1088,15 +1088,18 @@ Target table must exist before write.
:::
### Table maintenance
-Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning and compaction. There's no operational
overhead for the user.
+Hudi can run async or inline table services while running Strucrured Streaming
query and takes care of cleaning, compaction and clustering. There's no
operational overhead for the user.
+For CoW tables, table services work in inline mode by default.
+For MoR tables, some async services are enabled by default.
-Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table in metastore aftear each streaming write.
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table
services with Metadata Table enabled you must use Optimistic Concurrency
Control to avoid the risk of data loss (even in single writer scenario). See
[Metadata Table deployment
considerations](/docs/metadata#deployment-considerations) for detailed
instructions.
-:::info
-If you're using Foreach or ForeachBatch streaming sink you must explicitly use
inline table services.
-Async table services are not supported.
+If you're using Foreach or ForeachBatch streaming sink you must use inline
table services, async table services are not supported.
:::
+Hive Sync works with Structured Streaming, it will create table if not exists
and synchronize table to metastore aftear each streaming write.
+
## Point in time query
Lets look at how to query data as of a specific time. The specific time can be
represented by pointing endTime to a