This is an automated email from the ASF dual-hosted git repository.
danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d2522d0c0743 docs: add flink RLI related configurations (#18869)
d2522d0c0743 is described below
commit d2522d0c07432509e225e028fdb54d67c0b9fb7c
Author: Danny Chan <[email protected]>
AuthorDate: Thu May 28 21:47:20 2026 +0800
docs: add flink RLI related configurations (#18869)
---
website/docs/indexes.md | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/website/docs/indexes.md b/website/docs/indexes.md
index 67d398853a4c..0b294d34b72f 100644
--- a/website/docs/indexes.md
+++ b/website/docs/indexes.md
@@ -211,13 +211,38 @@ for more details. All these, support the index types
mentioned [above](#addition
#### Flink based configs
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and
record-level index are supported.
Following are the basic configs that control the indexing behavior. Please
refer [the Flink
configurations](configurations.md#Flink-Options-advanced-configs) for advanced
configs.
-| Config Name | Default | Description
|
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type | FLINK_STATE (Optional) | Index type of Flink
write job, default is using state backed index. Possible values:<br />
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br /> `Config Param: INDEX_TYPE`
|
-| hoodie.index.bucket.engine | SIMPLE (Optional) |
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name | Default
| Description
|
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type | FLINK_STATE
(Optional) | Index type of Flink write job, default is using state backed
index. Possible values:<br />
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
/> `Config Param: INDEX_TYPE`
|
+| hoodie.index.bucket.engine | SIMPLE
(Optional) | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType:
Determines the type of bucketing or hashing to use when `hoodie.index.type` is
set to `BUCKET`. Possible Values: <br />
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>
|
+| metadata.enabled | true
(Optional) | Enables the metadata table. Required for Flink record-level
index lookups.
|
+| index.global.enabled | true
(Optional) | Whether to update the old partition path when the same
record key arrives with a different partition path. This must be `true` for
`GLOBAL_RECORD_LEVEL_INDEX` and is set to `false` for `RECORD_LEVEL_INDEX`.
|
+| index.bootstrap.enabled | false
(Optional) | When `index.type=GLOBAL_RECORD_LEVEL_INDEX`, controls
whether Flink bootstraps the global index into a local RocksDB backend. If not
explicitly set for global RLI, Flink enables bootstrap by default. Set to
`false` to force native metadata-table RLI access.
|
+| index.bootstrap.rocksdb.path | (Optional)
| Local directory path for the RocksDB backend used when
`index.bootstrap.enabled=true`. Each task manager creates a unique subdirectory
under this path.
|
+| index.rli.cache.size | 256 (Optional)
| Maximum memory, in MB, allocated for the record-level index cache per
bucket-assign task. Applies to native metadata-table RLI access and partitioned
RLI caches.
|
+| index.rli.cache.concurrent.partitions.num | 2 (Optional)
| Expected number of partitions whose partitioned RLI caches are
updated concurrently. Used to size each partition cache when historical cache
usage is unavailable.
|
+| index.rli.lookup.minibatch.size | 1000
(Optional) | Maximum number of input records buffered for mini-batch
record-index lookup. Mini-batching reduces individual metadata-table lookup
calls for native global RLI access.
|
+| index.rli.write.buffer.size | 100 (Optional)
| Maximum memory, in MB, for the index record writer buffer. When the
threshold is reached, Flink flushes index records to avoid OOM.
|
+| index.write.tasks | (N/A)
| Parallelism for tasks that write record-level index records. Defaults
to the execution environment parallelism when not set.
|
+| metadata.compaction.schedule.enabled | true
(Optional) | Schedules metadata table compaction plans.
|
+| metadata.compaction.async.enabled | true
(Optional) | Runs metadata table compaction in the Flink compaction
pipeline when record-level index streaming writes are enabled.
|
+| metadata.compaction.delta_commits | 10 (Optional)
| Maximum metadata-table delta commits before metadata compaction is
triggered.
|
+| hoodie.metadata.record.level.index.defer.init | false
(Optional) | Defers RLI initialization for fresh tables. Flink ingestion
does not support deferred RLI initialization, so keep this set to `false` for
Flink RLI writes.
|
+| hoodie.metadata.global.record.level.index.min.filegroup.count | 10
(Optional) | Minimum number of file groups to use for Global Record
Index.
|
+| hoodie.metadata.global.record.level.index.max.filegroup.count | 10000
(Optional) | Maximum number of file groups to use for Global Record
Index.
|
+| hoodie.metadata.record.level.index.min.filegroup.count | 1 (Optional)
| Minimum number of file groups to use for Partitioned Record Index.
New data partitions use this value for their initial partitioned RLI file group
count, which is also used by dynamic bucket assignment before the partition
appears in the metadata table. |
+| hoodie.metadata.record.level.index.max.filegroup.count | 10 (Optional)
| Maximum number of file groups to use for Partitioned Record Index.
|
+
+Common Flink RLI configurations:
+
+| Use case | Required settings
| Notes [...]
+|----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------
[...]
+| Flink global RLI with native MDT access |
`index.type=GLOBAL_RECORD_LEVEL_INDEX`<br />`metadata.enabled=true`<br
/>`index.global.enabled=true`<br />`index.bootstrap.enabled=false`<br
/>`hoodie.metadata.record.level.index.defer.init=false`
| Flink rea [...]
+| Flink global RLI with local RocksDB cache |
`index.type=GLOBAL_RECORD_LEVEL_INDEX`<br />`metadata.enabled=true`<br
/>`index.global.enabled=true`<br />`index.bootstrap.enabled=true`<br
/>`index.bootstrap.rocksdb.path=<local-path>`<br
/>`hoodie.metadata.record.level.index.defer.init=false`
|
Flink boot [...]
+| Dynamic bucket scaling with partitioned RLI |
`index.type=RECORD_LEVEL_INDEX`<br />`metadata.enabled=true`<br
/>`index.global.enabled=false`<br
/>`hoodie.metadata.record.level.index.min.filegroup.count=<initial-file-groups-per-partition>`<br
/>`hoodie.metadata.record.level.index.max.filegroup.count=<max-file-groups-per-partition>`<br
/>Optionally tune `index.rli.cache.size` and
`index.rli.cache.concurrent.partitions.num` for the partition cache. | Flink
uses partition-scoped RLI [...]
### Picking Indexing Strategies