This is an automated email from the ASF dual-hosted git repository.
bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7dad2ccfbd2 [DOCS]Update Concurrency page (#9372)
7dad2ccfbd2 is described below
commit 7dad2ccfbd2afe5ffa88e3f6b2af5db71a79375d
Author: Bhavani Sudha Saktheeswaran <[email protected]>
AuthorDate: Tue Sep 5 15:28:33 2023 -0700
[DOCS]Update Concurrency page (#9372)
* [DOCS]Update Concurrency page
Summary:
- Add inline configs
- Add high level context
- Add section on Early conflict detection
* Address review feedback
---
website/docs/concurrency_control.md | 184 +++++++++++++++++++++++-------------
website/docs/table_types.md | 8 +-
website/src/theme/DocPage/index.js | 2 +-
3 files changed, 124 insertions(+), 70 deletions(-)
diff --git a/website/docs/concurrency_control.md
b/website/docs/concurrency_control.md
index 7a014d16140..750a9fee1ff 100644
--- a/website/docs/concurrency_control.md
+++ b/website/docs/concurrency_control.md
@@ -2,105 +2,131 @@
title: "Concurrency Control"
summary: In this page, we will discuss how to perform concurrent writes to
Hudi Tables.
toc: true
+toc_min_heading_level: 2
+toc_max_heading_level: 4
last_modified_at: 2021-03-19T15:59:57-04:00
---
+Concurrency control defines how different writers/readers coordinate access to
the table. Hudi ensures atomic writes, by way of publishing commits atomically
to the timeline, stamped with an instant time that denotes the time at which
the action is deemed to have occurred. Unlike general purpose file version
control, Hudi draws clear distinction between writer processes (that issue
user’s upserts/deletes), table services (that write data/metadata to
optimize/perform bookkeeping) and read [...]
-In this section, we will cover Hudi's concurrency model and describe ways to
ingest data into a Hudi Table from multiple writers; using the [Hudi
Streamer](#hudi-streamer) tool as well as
-using the [Hudi datasource](#datasource-writer).
+In this section, we will discuss the different concurrency controls supported
by Hudi and how they are leveraged to provide flexible deployment models; we
will cover multi-writing, a popular deployment model; finally, we’ll describe
ways to ingest data into a Hudi Table from multiple writers using different
writers, like Hudi Streamer, Hudi datasource, Spark Structured Streaming and
Spark SQL.
-## Supported Concurrency Controls
-- **MVCC** : Hudi table services such as compaction, cleaning, clustering
leverage Multi Version Concurrency Control to provide snapshot isolation
-between multiple table service writers and readers. Additionally, using MVCC,
Hudi provides snapshot isolation between an ingestion writer and multiple
concurrent readers.
- With this model, Hudi supports running any number of table service jobs
concurrently, without any concurrency conflict.
- This is made possible by ensuring that scheduling plans of such table
services always happens in a single writer mode to ensure no conflict and
avoids race conditions.
+## Deployment models with supported concurrency controls
-- **[NEW] OPTIMISTIC CONCURRENCY** : Write operations such as the ones
described above (UPSERT, INSERT) etc, leverage optimistic concurrency control
to enable multiple ingestion writers to
-the same Hudi Table. Hudi supports `file level OCC`, i.e., for any 2 commits
(or writers) happening to the same table, if they do not have writes to
overlapping files being changed, both writers are allowed to succeed.
- This feature is currently *experimental* and requires either Zookeeper or
HiveMetastore to acquire locks.
+### Model A: Single writer with inline table services
-It may be helpful to understand the different guarantees provided by [write
operations](/docs/write_operations/) via Hudi datasource or the Hudi Streamer.
+This is the simplest form of concurrency, meaning there is no concurrency at
all in the write processes. In this model, Hudi eliminates the need for
concurrency control and maximizes throughput by supporting these table services
out-of-box and running inline after every write to the table. Execution plans
are idempotent, persisted to the timeline and auto-recover from failures. For
most simple use-cases, this means just writing is sufficient to get a
well-managed table that needs no conc [...]
-## Single Writer Guarantees
+There is no actual concurrent writing in this model. **MVCC** is leveraged to
provide snapshot isolation guarantees between ingestion writer and multiple
readers and also between multiple table service writers and readers. Writes to
the table either from ingestion or from table services produce versioned data
that are available to readers only after the writes are committed. Until then,
readers can access only the previous version of the data.
- - *UPSERT Guarantee*: The target table will NEVER show duplicates.
- - *INSERT Guarantee*: The target table wilL NEVER have duplicates if
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is
enabled.
- - *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is
enabled.
- - *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints are NEVER
out of order.
+A single writer with all table services such as cleaning, clustering,
compaction, etc can be configured to be inline (such as Hudi Streamer sync-once
mode and Spark Datasource with default configs) without any additional configs.
-## Multi Writer Guarantees
+#### Single Writer Guarantees
-With multiple writers using OCC, some of the above guarantees change as follows
+In this model, the following are the guarantees on [write
operations](https://hudi.apache.org/docs/write_operations/) to expect:
- *UPSERT Guarantee*: The target table will NEVER show duplicates.
-- *INSERT Guarantee*: The target table MIGHT have duplicates even if
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is
enabled.
-- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is
enabled.
-- *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints MIGHT be out
of order due to multiple writer jobs finishing at different times.
+- *INSERT Guarantee*: The target table wilL NEVER have duplicates if dedup:
[`hoodie.datasource.write.insert.drop.duplicates`](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
&
[`hoodie.combine.before.insert`](https://hudi.apache.org/docs/configurations/#hoodiecombinebeforeinsert),
is enabled.
+- *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if
dedup:
[`hoodie.datasource.write.insert.drop.duplicates`](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
&
[`hoodie.combine.before.insert`](https://hudi.apache.org/docs/configurations/#hoodiecombinebeforeinsert),
is enabled.
+- *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints are NEVER out
of order.
+
+
+### Model B: Single writer with async table services
+
+Hudi provides the option of running the table services in an async fashion,
where most of the heavy lifting (e.g actually rewriting the columnar data by
compaction service) is done asynchronously. In this model, the async deployment
eliminates any repeated wasteful retries and optimizes the table using
clustering techniques while a single writer consumes the writes to the table
without having to be blocked by such table services. This model avoids the need
for taking an [external lock](# [...]
+
+A single writer along with async table services runs in the same process. For
example, you can have a Hudi Streamer in continuous mode write to a MOR table
using async compaction; you can use Spark Streaming (where
[compaction](https://hudi.apache.org/docs/compaction) is async by default), and
you can use Flink streaming or your own job setup and enable async table
services inside the same writer.
+
+Hudi leverages **MVCC** in this model to support running any number of table
service jobs concurrently, without any concurrency conflict. This is made
possible by ensuring Hudi 's ingestion writer and async table services
coordinate among themselves to ensure no conflicts and avoid race conditions.
The same single write guarantees described in Model A above can be achieved in
this model as well.
+With this model users don't need to spin up different spark jobs and manage
the orchestration among it. For larger deployments, this model can ease the
operational burden significantly while getting the table services running
without blocking the writers.
+
+### Model C: Multi-writer
+
+It is not always possible to serialize all write operations to a table (such
as UPSERT, INSERT or DELETE) into the same write process and therefore,
multi-writing capability may be required. In multi-writing, disparate
distributed processes run in parallel or overlapping time windows to write to
the same table. In such cases, an external locking mechanism becomes necessary
to coordinate concurrent accesses. Here are few different scenarios that would
all fall under multi-writing:
+
+- Multiple ingestion writers to the same table:For instance, two Spark
Datasource writers working on different sets of partitions form a source kafka
topic.
+- Multiple ingestion writers to the same table, including one writer with
async table services: For example, a Hudi Streamer with async compaction for
regular ingestion & a Spark Datasource writer for backfilling.
+- A single ingestion writer and a separate compaction (HoodieCompactor) or
clustering (HoodieClusteringJob) job apart from the ingestion writer: This is
considered as multi-writing as they are not running in the same process.
+
+Hudi's concurrency model intelligently differentiates actual writing to the
table from table services that manage or optimize the table. Hudi offers
similar **optimistic concurrency control across multiple writers**, but **table
services can still execute completely lock-free and async** as long as they run
in the same process as one of the writers.
+For multi-writing, Hudi leverages file level optimistic concurrency
control(OCC). For example, when two writers write to non overlapping files,
both writes are allowed to succeed. However, when the writes from different
writers overlap (touch the same set of files), only one of them will succeed.
Please note that this feature is currently experimental and requires external
lock providers to acquire locks briefly at critical sections during the write.
More on lock providers below.
+
+#### Multi Writer Guarantees
+
+With multiple writers using OCC, these are the write guarantees to expect:
+
+- *UPSERT Guarantee*: The target table will NEVER show duplicates.
+- *INSERT Guarantee*: The target table MIGHT have duplicates even if dedup is
enabled.
+- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if
dedup is enabled.
+- *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints are NEVER out
of order. If there are inflight commits
+ (due to multi-writing), incremental queries will not expose the completed
commits following the inflight commits.
+
## Enabling Multi Writing
-The following properties are needed to be set properly to turn on optimistic
concurrency control.
+The following properties are needed to be set appropriately to turn on
optimistic concurrency control to achieve multi writing.
```
hoodie.write.concurrency.mode=optimistic_concurrency_control
-hoodie.cleaner.policy.failed.writes=LAZY
hoodie.write.lock.provider=<lock-provider-classname>
+hoodie.cleaner.policy.failed.writes=LAZY
```
-There are 4 different lock providers that require different configurations to
be set.
-
-**`FileSystem`** based lock provider
-
-FileSystem based lock provider supports multiple writers cross different
jobs/applications based on atomic create/delete operations of the underlying
filesystem.
+| Config Name | Default
| Description
[...]
+|-------------------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| hoodie.write.concurrency.mode | SINGLE_WRITER (Optional)
| <u>[Concurrency
modes](https://github.com/apache/hudi/blob/c387f2a6dd3dc9db2cd22ec550a289d3a122e487/hudi-common/src/main/java/org/apache/hudi/common/model/WriteConcurrencyMode.java)</u>
for write operations.<br />Possible values:<br /><ul><li>`SINGLE_WRITER`: Only
one active writer to the table. Maximizes
throughput.</li><li>`OPTIMISTIC_CONCURRENCY_CONTROL`: Multiple wr [...]
+| hoodie.write.lock.provider |
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider (Optional) |
Lock provider class name, user can provide their own implementation of
LockProvider which should be subclass of
org.apache.hudi.common.lock.LockProvider<br /><br />`Config Param:
LOCK_PROVIDER_CLASS_NAME`<br />`Since Version: 0.8.0`
[...]
+| hoodie.cleaner.policy.failed.writes | EAGER (Optional)
|
org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy: Policy that
controls how to clean up failed writes. Hudi will delete any files written by
failed writes to re-claim space. EAGER(default): Clean failed writes inline
after every write operation. LAZY: Clean failed writes lazily after
heartbeat timeout when the cleaning service runs. This policy is re [...]
-:::note
-FileSystem based lock provider is not supported with cloud storage like S3 or
GCS.
-:::
-```
-hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
-hoodie.write.lock.filesystem.path (optional)
-hoodie.write.lock.filesystem.expire (optional)
-```
+### External Locking and lock providers
-When using the FileSystem based lock provider, by default, the lock file will
store into `hoodie.base.path`+`/.hoodie/lock`. You may use a custom folder to
store the lock file by specifying `hoodie.write.lock.filesystem.path`.
+As can be seen above, a lock provider needs to be configured in muti-writing
scenarios. External locking is typically used in conjunction with optimistic
concurrency control because it provides a way to prevent conflicts that might
occur when two or more transactions (commits in our case) attempt to modify the
same resource concurrently. When a transaction attempts to modify a resource
that is currently locked by another transaction, it must wait until the lock is
released before proceeding.
-In case the lock cannot release during job crash, you can set
`hoodie.write.lock.filesystem.expire` (lock will never expire by default). You
may also delete lock file manually in such situation.
+In case of multi-writing in Hudi, the locks are acquired on the Hudi table for
a very short duration during specific phases (such as just before committing
the writes or before scheduling table services) instead of locking for the
entire span of time. This approach allows multiple writers to work on the same
table simultaneously, increasing concurrency and avoids conflicts.
-**`Zookeeper`** based lock provider
+There are 4 different lock providers that require different configurations to
be set. Please refer to comprehensive locking configs
[here](https://hudi.apache.org/docs/next/configurations#LOCK).
+#### Zookeeper based lock provider
```
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
-hoodie.write.lock.zookeeper.url
-hoodie.write.lock.zookeeper.port
-hoodie.write.lock.zookeeper.lock_key
-hoodie.write.lock.zookeeper.base_path
```
+Following are the basic configs required to setup this lock provider:
-**`HiveMetastore`** based lock provider
+| Config Name| Default| Description
|
+| ----------------------------------------------------------------------------
| ------------------------
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.write.lock.zookeeper.base_path | N/A **(Required)** | The base
path on Zookeeper under which to create lock related ZNodes. This should be
same for all concurrent writers to the same table<br /><br />`Config Param:
ZK_BASE_PATH`<br />`Since Version: 0.8.0` |
+| hoodie.write.lock.zookeeper.port | N/A **(Required)** | Zookeeper
port to connect to.<br /><br />`Config Param: ZK_PORT`<br />`Since Version:
0.8.0`
|
+| hoodie.write.lock.zookeeper.url | N/A **(Required)** | Zookeeper
URL to connect to.<br /><br />`Config Param: ZK_CONNECT_URL`<br />`Since
Version: 0.8.0`
|
+
+#### HiveMetastore based lock provider
```
hoodie.write.lock.provider=org.apache.hudi.hive.transaction.lock.HiveMetastoreBasedLockProvider
-hoodie.write.lock.hivemetastore.database
-hoodie.write.lock.hivemetastore.table
```
+Following are the basic configs required to setup this lock provider:
-`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
-
-**`Amazon DynamoDB`** based lock provider
+| Config Name| Default| Description
|
+| ----------------------------------------------------------------------- |
------------------------
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.write.lock.hivemetastore.database | N/A
**(Required)**
| For Hive based lock provider, the Hive database to acquire
lock against<br /><br />`Config Param: HIVE_DATABASE_NAME`<br />`Since Version:
0.8.0`
|
+| hoodie.write.lock.hivemetastore.table | N/A
**(Required)**
| For Hive based lock provider, the Hive table to acquire lock
against<br /><br />`Config Param: HIVE_TABLE_NAME`<br />`Since Version: 0.8.0`
|
-Amazon DynamoDB based lock provides a simple way to support multi writing
across different clusters. You can refer to the
-[DynamoDB based Locks
Configurations](https://hudi.apache.org/docs/configurations#DynamoDB-based-Locks-Configurations)
-section for the details of each related configuration knob.
+`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
+#### Amazon DynamoDB based lock provider
```
hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
-hoodie.write.lock.dynamodb.table (required)
-hoodie.write.lock.dynamodb.partition_key (optional)
-hoodie.write.lock.dynamodb.region (optional)
-hoodie.write.lock.dynamodb.endpoint_url (optional)
-hoodie.write.lock.dynamodb.billing_mode (optional)
```
+Amazon DynamoDB based lock provides a simple way to support multi writing
across different clusters. You can refer to the
+[DynamoDB based Locks
Configurations](https://hudi.apache.org/docs/configurations#DynamoDB-based-Locks-Configurations)
+section for the details of each related configuration knob. Following are the
basic configs required to setup this lock provider:
+
+| Config Name| Default| Description
|
+| ----------------------------------------------------------------------- |
------------------------
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.write.lock.dynamodb.endpoint_url| N/A **(Required)** | For
DynamoDB based lock provider, the url endpoint used for Amazon DynamoDB
service. Useful for development with a local dynamodb instance.<br /><br
/>`Config Param: DYNAMODB_ENDPOINT_URL`<br />`Since Version: 0.10.1`|
+
+For advanced configs refer
[here](https://hudi.apache.org/docs/next/configurations#DynamoDB-based-Locks-Configurations)
+
When using the DynamoDB-based lock provider, the name of the DynamoDB table
acting as the lock table for Hudi is
specified by the config `hoodie.write.lock.dynamodb.table`. This DynamoDB
table is automatically created by Hudi, so you
@@ -140,7 +166,7 @@ IAM policy for your service instance will need to add the
following permissions:
- `TableName` : same as `hoodie.write.lock.dynamodb.partition_key`
- `Region`: same as `hoodie.write.lock.dynamodb.region`
-AWS SDK dependencies are not bundled with Hudi from v0.10.x and will need to
be added to your classpath.
+AWS SDK dependencies are not bundled with Hudi from v0.10.x and will need to
be added to your classpath.
Add the following Maven packages (check the latest versions at time of
install):
```
com.amazonaws:dynamodb-lock-client
@@ -148,7 +174,22 @@ com.amazonaws:aws-java-sdk-dynamodb
com.amazonaws:aws-java-sdk-core
```
-## Datasource Writer
+#### FileSystem based lock provider (Experimental)
+
+FileSystem based lock provider supports multiple writers cross different
jobs/applications based on atomic create/delete operations of the underlying
filesystem.
+
+```
+hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
+```
+
+When using the FileSystem based lock provider, by default, the lock file will
store into `hoodie.base.path`+`/.hoodie/lock`. You may use a custom folder to
store the lock file by specifying `hoodie.write.lock.filesystem.path`.
+
+In case the lock cannot release during job crash, you can set
`hoodie.write.lock.filesystem.expire` (lock will never expire by default) to a
desired expire time in minutes. You may also delete lock file manually in such
situation.
+:::note
+FileSystem based lock provider is not supported with cloud storage like S3 or
GCS.
+:::
+
+## Multi Writing via Spark Datasource Writer
The `hudi-spark` module offers the DataSource API to write (and read) a Spark
DataFrame into a Hudi table.
@@ -162,7 +203,6 @@ inputDF.write.format("hudi")
.option("hoodie.write.concurrency.mode",
"optimistic_concurrency_control")
.option("hoodie.write.lock.zookeeper.url", "zookeeper")
.option("hoodie.write.lock.zookeeper.port", "2181")
- .option("hoodie.write.lock.zookeeper.lock_key", "test_table")
.option("hoodie.write.lock.zookeeper.base_path", "/test")
.option(RECORDKEY_FIELD_OPT_KEY, "uuid")
.option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath")
@@ -171,7 +211,7 @@ inputDF.write.format("hudi")
.save(basePath)
```
-## Hudi Streamer
+## Multi Writing via Hudi Streamer
The `HoodieStreamer` utility (part of hudi-utilities-bundle) provides ways to
ingest from different sources such as DFS or Kafka, with the following
capabilities.
@@ -186,18 +226,32 @@ A Hudi Streamer job can then be triggered as follows:
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--source-ordering-field impresssiontime \
--target-base-path file:\/\/\/tmp/hudi-streamer-op \
- --target-table uber.impressions \
+ --target-table tableName \
--op BULK_INSERT
```
+## Early conflict Detection
+
+Multi writing using OCC allows multiple writers to concurrently write and
atomically commit to the Hudi table if there is no overlapping data file to be
written, to guarantee data consistency, integrity and correctness. Prior to
0.13.0 release, as the OCC (optimistic concurrency control) name suggests, each
writer will optimistically proceed with ingestion and towards the end, just
before committing will go about conflict resolution flow to deduce overlapping
writes and abort one if need [...]
+
+To improve the concurrency control, the [0.13.0
release](https://hudi.apache.org/releases/release-0.13.0#early-conflict-detection-for-multi-writer)
introduced a new feature, early conflict detection in OCC, to detect the
conflict during the data writing phase and abort the writing early on once a
conflict is detected, using Hudi's marker mechanism. Hudi can now stop a
conflicting writer much earlier because of the early conflict detection and
release computing resources necessary to clus [...]
+
+By default, this feature is turned off. To try this out, a user needs to set
`hoodie.write.concurrency.early.conflict.detection.enable` to true, when using
OCC for concurrency control (Refer
[configs](https://hudi.apache.org/docs/next/configurations#Write-Configurations-advanced-configs)
page for all relevant configs).
+:::note
+Early conflict Detection in OCC is an **EXPERIMENTAL** feature
+:::
+
## Best Practices when using Optimistic Concurrency Control
-Concurrent Writing to Hudi tables requires acquiring a lock with either
Zookeeper or HiveMetastore. Due to several reasons you might want to configure
retries to allow your application to acquire the lock.
+Concurrent Writing to Hudi tables requires acquiring a lock with one of the
lock providers mentioned above. Due to several reasons you might want to
configure retries to allow your application to acquire the lock.
1. Network connectivity or excessive load on servers increasing time for lock
acquisition resulting in timeouts
2. Running a large number of concurrent jobs that are writing to the same hudi
table can result in contention during lock acquisition can cause timeouts
3. In some scenarios of conflict resolution, Hudi commit operations might take
upto 10's of seconds while the lock is being held. This can result in timeouts
for other jobs waiting to acquire a lock.
-Set the correct native lock provider client retries. NOTE that sometimes these
settings are set on the server once and all clients inherit the same configs.
Please check your settings before enabling optimistic concurrency.
+Set the correct native lock provider client retries.
+:::note
+Please note that sometimes these settings are set on the server once and all
clients inherit the same configs. Please check your settings before enabling
optimistic concurrency.
+:::
```
hoodie.write.lock.wait_time_ms
@@ -225,4 +279,4 @@ hoodie.cleaner.policy.failed.writes=EAGER
## Caveats
If you are using the `WriteClient` API, please note that multiple writes to
the table need to be initiated from 2 different instances of the write client.
-It is NOT recommended to use the same instance of the write client to perform
multi writing.
\ No newline at end of file
+It is **NOT** recommended to use the same instance of the write client to
perform multi writing.
\ No newline at end of file
diff --git a/website/docs/table_types.md b/website/docs/table_types.md
index 868f265e00a..76e50a1bf87 100644
--- a/website/docs/table_types.md
+++ b/website/docs/table_types.md
@@ -136,11 +136,11 @@ Refer
[here](https://hudi.apache.org/docs/next/configurations#Read-Options) for
### Flink Configs
-| Config Name
| Default | Description
|
-|------------------------------------------------------------------------------------------|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Config Name
| Default | Description
|
+|------------------------------------------------------------------------------------------|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| hoodie.datasource.query.type | snapshot
(Optional) | Decides how data files need to be read, in 1)
Snapshot mode (obtain latest view, based on row & columnar data); 2)
incremental mode (new data since an instantTime). If `cdc.enabled` is set
incremental queries on cdc data are possible; 3) Read Optimized mode (obtain
latest view, based on columnar data) .Default: snapshot<br /><br /> `Config
Param: QUERY_TYPE` |
-| read.start-commit | N/A
**(Required)** | Required in case of incremental queries.
Start commit instant for reading, the commit time format should be
'yyyyMMddHHmmss', by default reading from the latest instant for streaming
read<br /><br /> `Config Param: READ_START_COMMIT`
|
-| read.end-commit | N/A
**(Required)** | Used int he context of incremental queries.
End commit instant for reading, the commit time format should be
'yyyyMMddHHmmss'<br /><br /> `Config Param: READ_END_COMMIT`
|
+| read.start-commit | N/A
**(Required)** | Required in case of incremental queries.
Start commit instant for reading, the commit time format should be
'yyyyMMddHHmmss', by default reading from the latest instant for streaming
read<br /><br /> `Config Param: READ_START_COMMIT`
|
+| read.end-commit | N/A
**(Required)** | Used in the context of incremental queries.
End commit instant for reading, the commit time format should be
'yyyyMMddHHmmss'<br /><br /> `Config Param: READ_END_COMMIT`
|
Refer [here](https://hudi.apache.org/docs/next/configurations#Flink-Options)
for more details.
diff --git a/website/src/theme/DocPage/index.js
b/website/src/theme/DocPage/index.js
index 2c5dc031611..817f8474215 100644
--- a/website/src/theme/DocPage/index.js
+++ b/website/src/theme/DocPage/index.js
@@ -128,7 +128,7 @@ function DocPageContent({
);
}
-const arrayOfPages = (matchPath) => [`${matchPath}/configurations`,
`${matchPath}/basic_configurations`, `${matchPath}/timeline`,
`${matchPath}/table_types`, `${matchPath}/migration_guide`,
`${matchPath}/compaction`, `${matchPath}/clustering`, `${matchPath}/indexing`,
`${matchPath}/metadata`, `${matchPath}/metadata_indexing`,
`${matchPath}/record_payload`, `${matchPath}/file_sizing`,
`${matchPath}/hoodie_cleaner`];
+const arrayOfPages = (matchPath) => [`${matchPath}/configurations`,
`${matchPath}/basic_configurations`, `${matchPath}/timeline`,
`${matchPath}/table_types`, `${matchPath}/migration_guide`,
`${matchPath}/compaction`, `${matchPath}/clustering`, `${matchPath}/indexing`,
`${matchPath}/metadata`, `${matchPath}/metadata_indexing`,
`${matchPath}/record_payload`, `${matchPath}/file_sizing`,
`${matchPath}/hoodie_cleaner`, `${matchPath}/concurrency_control`];
const showCustomStylesForDocs = (matchPath, pathname) =>
arrayOfPages(matchPath).includes(pathname);
function DocPage(props) {
const {