[hudi] branch asf-site updated: [DOCS] improve spark quickstart, info about MT and async services (#7549)

sivabalan Tue, 07 Feb 2023 16:14:47 -0800

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 2a9a03a43d5 [DOCS] improve spark quickstart, info about MT and async 
services (#7549)
2a9a03a43d5 is described below

commit 2a9a03a43d5dd506b12879f5d1b4953d4961aed0
Author: kazdy <[email protected]>
AuthorDate: Wed Feb 8 01:14:33 2023 +0100

    [DOCS] improve spark quickstart, info about MT and async services (#7549)
    
    - improve spark quickstart, add info about table maintenance and async 
services when metadata table is enabled
---
 website/docs/quick-start-guide.md                          | 13 ++++++++-----
 website/versioned_docs/version-0.10.0/quick-start-guide.md | 13 +++++++++++++
 website/versioned_docs/version-0.10.1/quick-start-guide.md | 13 +++++++++++++
 website/versioned_docs/version-0.11.0/quick-start-guide.md | 13 +++++++++++++
 website/versioned_docs/version-0.11.1/quick-start-guide.md | 13 +++++++++++++
 website/versioned_docs/version-0.12.0/quick-start-guide.md | 13 +++++++++++++
 website/versioned_docs/version-0.12.1/quick-start-guide.md | 13 +++++++++++++
 website/versioned_docs/version-0.12.2/quick-start-guide.md | 13 ++++++++-----
 8 files changed, 94 insertions(+), 10 deletions(-)

diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index f456468898a..1d8f688a03d 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -1086,15 +1086,18 @@ Target table must exist before write.
 :::
 
 ### Table maintenance
-Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning and compaction. There's no operational 
overhead for the user.
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
 
-Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table in metastore aftear each streaming write.
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table 
services with Metadata Table enabled you must use Optimistic Concurrency 
Control to avoid the risk of data loss (even in single writer scenario). See 
[Metadata Table deployment 
considerations](/docs/next/metadata#deployment-considerations) for detailed 
instructions.
 
-:::info
-If you're using Foreach or ForeachBatch streaming sink you must explicitly use 
inline table services. 
-Async table services are not supported.
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
 :::
 
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a 
diff --git a/website/versioned_docs/version-0.10.0/quick-start-guide.md 
b/website/versioned_docs/version-0.10.0/quick-start-guide.md
index e98d34d9d3e..b014b2ffb07 100644
--- a/website/versioned_docs/version-0.10.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.10.0/quick-start-guide.md
@@ -982,6 +982,19 @@ Spark SQL can be used within ForeachBatch sink to do 
INSERT, UPDATE, DELETE and
 Target table must exist before write.
 :::
 
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
+
+:::note
+In Hudi 0.10 Metadata Table was introduced. When using async table services 
with Metadata Table enabled you must use Optimistic Concurrency Control to 
avoid the risk of data loss (even in single writer scenario). See [Metadata 
Table deployment 
considerations](/docs/0.10.0/metadata#deployment-considerations) for detailed 
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a
diff --git a/website/versioned_docs/version-0.10.1/quick-start-guide.md 
b/website/versioned_docs/version-0.10.1/quick-start-guide.md
index e96400c9329..f61e920d4e6 100644
--- a/website/versioned_docs/version-0.10.1/quick-start-guide.md
+++ b/website/versioned_docs/version-0.10.1/quick-start-guide.md
@@ -999,6 +999,19 @@ Spark SQL can be used within ForeachBatch sink to do 
INSERT, UPDATE, DELETE and
 Target table must exist before write.
 :::
 
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
+
+:::note
+In Hudi 0.10 Metadata Table was introduced. When using async table services 
with Metadata Table enabled you must use Optimistic Concurrency Control to 
avoid the risk of data loss (even in single writer scenario). See [Metadata 
Table deployment 
considerations](/docs/0.10.1/metadata#deployment-considerations) for detailed 
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a 
diff --git a/website/versioned_docs/version-0.11.0/quick-start-guide.md 
b/website/versioned_docs/version-0.11.0/quick-start-guide.md
index 97305e0e0bf..7448fa67f5c 100644
--- a/website/versioned_docs/version-0.11.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.11.0/quick-start-guide.md
@@ -1043,6 +1043,19 @@ Spark SQL can be used within ForeachBatch sink to do 
INSERT, UPDATE, DELETE and
 Target table must exist before write.
 :::
 
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table 
services with Metadata Table enabled you must use Optimistic Concurrency 
Control to avoid the risk of data loss (even in single writer scenario). See 
[Metadata Table deployment 
considerations](/docs/0.11.0/metadata#deployment-considerations) for detailed 
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a 
diff --git a/website/versioned_docs/version-0.11.1/quick-start-guide.md 
b/website/versioned_docs/version-0.11.1/quick-start-guide.md
index 14708099898..4101157b5c8 100644
--- a/website/versioned_docs/version-0.11.1/quick-start-guide.md
+++ b/website/versioned_docs/version-0.11.1/quick-start-guide.md
@@ -1041,6 +1041,19 @@ Spark SQL can be used within ForeachBatch sink to do 
INSERT, UPDATE, DELETE and
 Target table must exist before write.
 :::
 
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table 
services with Metadata Table enabled you must use Optimistic Concurrency 
Control to avoid the risk of data loss (even in single writer scenario). See 
[Metadata Table deployment 
considerations](/docs/0.11.1/metadata#deployment-considerations) for detailed 
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a 
diff --git a/website/versioned_docs/version-0.12.0/quick-start-guide.md 
b/website/versioned_docs/version-0.12.0/quick-start-guide.md
index 35f6f52d382..0661e470899 100644
--- a/website/versioned_docs/version-0.12.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.0/quick-start-guide.md
@@ -1074,6 +1074,19 @@ Spark SQL can be used within ForeachBatch sink to do 
INSERT, UPDATE, DELETE and
 Target table must exist before write.
 :::
 
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table 
services with Metadata Table enabled you must use Optimistic Concurrency 
Control to avoid the risk of data loss (even in single writer scenario). See 
[Metadata Table deployment 
considerations](/docs/0.12.0/metadata#deployment-considerations) for detailed 
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a 
diff --git a/website/versioned_docs/version-0.12.1/quick-start-guide.md 
b/website/versioned_docs/version-0.12.1/quick-start-guide.md
index 4b2795b0d7c..44a7dbb9ed5 100644
--- a/website/versioned_docs/version-0.12.1/quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.1/quick-start-guide.md
@@ -1074,6 +1074,19 @@ Spark SQL can be used within ForeachBatch sink to do 
INSERT, UPDATE, DELETE and
 Target table must exist before write.
 :::
 
+### Table maintenance
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
+
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table 
services with Metadata Table enabled you must use Optimistic Concurrency 
Control to avoid the risk of data loss (even in single writer scenario). See 
[Metadata Table deployment 
considerations](/docs/0.12.0/metadata#deployment-considerations) for detailed 
instructions.
+
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
+:::
+
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a 
diff --git a/website/versioned_docs/version-0.12.2/quick-start-guide.md 
b/website/versioned_docs/version-0.12.2/quick-start-guide.md
index 2b980b22719..956ac8c978f 100644
--- a/website/versioned_docs/version-0.12.2/quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.2/quick-start-guide.md
@@ -1088,15 +1088,18 @@ Target table must exist before write.
 :::
 
 ### Table maintenance
-Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning and compaction. There's no operational 
overhead for the user.
+Hudi can run async or inline table services while running Strucrured Streaming 
query and takes care of cleaning, compaction and clustering. There's no 
operational overhead for the user.  
+For CoW tables, table services work in inline mode by default.  
+For MoR tables, some async services are enabled by default.
 
-Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table in metastore aftear each streaming write.
+:::note
+Since Hudi 0.11 Metadata Table is enabled by default. When using async table 
services with Metadata Table enabled you must use Optimistic Concurrency 
Control to avoid the risk of data loss (even in single writer scenario). See 
[Metadata Table deployment 
considerations](/docs/metadata#deployment-considerations) for detailed 
instructions.
 
-:::info
-If you're using Foreach or ForeachBatch streaming sink you must explicitly use 
inline table services. 
-Async table services are not supported.
+If you're using Foreach or ForeachBatch streaming sink you must use inline 
table services, async table services are not supported.
 :::
 
+Hive Sync works with Structured Streaming, it will create table if not exists 
and synchronize table to metastore aftear each streaming write.
+
 ## Point in time query
 
 Lets look at how to query data as of a specific time. The specific time can be 
represented by pointing endTime to a

[hudi] branch asf-site updated: [DOCS] improve spark quickstart, info about MT and async services (#7549)

Reply via email to