(fluss) 14/19: [docs] add introduction for auto-increment column (#2650)

jark Thu, 12 Feb 2026 01:53:10 -0800

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch release-0.9
in repository https://gitbox.apache.org/repos/asf/fluss.git


commit 4454a5a002dafc1aa5809f3e44dfd6ef8a6f5fe6
Author: xx789 <[email protected]>
AuthorDate: Thu Feb 12 14:02:26 2026 +0800

    [docs] add introduction for auto-increment column (#2650)
---
 website/docs/table-design/table-types/pk-table.md | 73 +++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/website/docs/table-design/table-types/pk-table.md 
b/website/docs/table-design/table-types/pk-table.md
index 7331d7619..5791452d2 100644
--- a/website/docs/table-design/table-types/pk-table.md
+++ b/website/docs/table-design/table-types/pk-table.md
@@ -138,6 +138,79 @@ Generate the following output in the Flink SQL CLI:
 4 rows in set
 ```
 
+## Auto-Increment Column
+
+In Fluss, the auto increment column is a feature that automatically generates 
a unique numeric value, commonly used to create unique identifiers for each row 
of data.
+Each time a new record is inserted, the auto increment column automatically 
assigns an incrementing value, eliminating the need for manually specifying the 
number.
+
+One application scenario of the auto-increment column is to accelerate the 
counting of distinct values in a high-cardinality column:
+an auto-increment column can be used to represent the unique value column in a 
dictionary.
+Compared to directly counting distinct STRING values, counting distinct 
integer values of the auto-increment column can sometimes improve the query 
speed by several times or even tens of times.
+
+Furthermore, Fluss provides native support for RoaringBitmap-based 
aggregations through the built-in `rbm32` and `rbm64` aggregation functions, 
available in the [Aggregation Merge 
Engine](/docs/table-design/merge-engines/aggregation.md).
+These functions are optimized to work seamlessly with auto-incremented integer 
columns. A typical usage pattern involves creating a dictionary table that maps 
raw identifiers (e.g., strings or sparse IDs) to compact, dense integer IDs via 
an auto-increment column.
+These dense IDs are then aggregated into a RoaringBitmap using `rbm32` (for 
32-bit IDs) or `rbm64` (for 64-bit IDs), enabling highly efficient 
count-distinct computations both in storage and during query execution.
+
+### Basic features
+
+#### Uniqueness
+
+Fluss guarantees table-wide uniqueness for values it generates in the 
auto-increment column.
+
+#### Monotonicity
+In order to improve the performance of allocating auto-incremented IDs, each 
table bucket on TabletServers caches some auto-incremented IDs locally.
+In this situation, Fluss cannot guarantee that the values for the 
auto-increment column are strictly monotonic globally.
+It can only be ensured that the values roughly increase in chronological order.
+
+:::note
+The number of auto-incremented IDs cached by the TabletServers is controlled 
by the table property `table.auto-increment.cache-size`,
+which defaults to 100,000. A larger cache size can enhance the performance of 
auto-incremented ID allocation but may result in
+less monotonic values in the auto-increment column. You can configure 
different cache sizes for different tables based on your specific requirements.
+However, this property cannot be modified after the table has been created.
+:::
+
+For example, create a table named `uid_mapping` with 2 buckets and insert five 
rows of data as follows:
+
+```sql
+CREATE TABLE uid_mapping (
+  user_id STRING,
+  uid BIGINT,
+  PRIMARY KEY (user_id) NOT ENFORCED
+) WITH (
+  'table.auto-increment.fields' = 'uid_int64',
+  'bucket.num' = '2'
+);
+
+INSERT INTO uid_mapping VALUES ('user1');
+INSERT INTO uid_mapping VALUES ('user2');
+INSERT INTO uid_mapping VALUES ('user3');
+INSERT INTO uid_mapping VALUES ('user4');
+INSERT INTO uid_mapping VALUES ('user5');
+```
+
+The auto-incremented IDs in the table `uid_mapping` do not monotonically 
increase, because the two table buckets cache auto-incremented IDs, [1, 100000] 
and [100001, 200000], respectively.
+
+```sql
+SELECT * FROM uid_mapping;
++---------+---------+
+| user_id |   uid   |
++---------+---------+
+| user1   |             1  |
+| user2   | 100001  |
+| user3   |     2  |
+| user4   |     3  |
+| user5   | 100002  |
++---------+---------+
+```
+
+### Limits
+- Auto-increment columns can only be used in primary key tables.
+- Explicitly specifying values for the auto-increment column is not allowed. 
The value for an auto-increment column can only be implicitly assigned.
+- A table can have only one auto-increment column.
+- The auto-increment column must be of type `INT` or `BIGINT`.
+- Fluss does not support specifying the starting value and step size for the 
auto-increment column.
+
+
 ## Data Queries
 
 For primary key tables, Fluss supports various kinds of querying abilities.

(fluss) 14/19: [docs] add introduction for auto-increment column (#2650)

Reply via email to