(paimon) branch master updated: [doc] Add documentation for global index

lzljs3620320 Tue, 24 Mar 2026 04:04:23 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new a219294f7a [doc] Add documentation for global index
a219294f7a is described below

commit a219294f7a0c8d02b1d8035ae729b53235b55b82
Author: JingsongLi <[email protected]>
AuthorDate: Tue Mar 24 19:04:00 2026 +0800

    [doc] Add documentation for global index
---
 docs/content/append-table/global-index.md | 135 ++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

diff --git a/docs/content/append-table/global-index.md 
b/docs/content/append-table/global-index.md
new file mode 100644
index 0000000000..dc90a4664f
--- /dev/null
+++ b/docs/content/append-table/global-index.md
@@ -0,0 +1,135 @@
+---
+title: "Global Index"
+weight: 8
+type: docs
+aliases:
+- /append-table/global-index.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Global Index
+
+## Overview
+
+Global Index is a powerful indexing mechanism for Data Evolution (append) 
tables. It enables efficient row-level lookups and filtering
+without full-table scans. Paimon supports multiple global index types:
+
+- **BTree Index**: A B-tree based index for scalar column lookups. Supports 
equality, IN, range predicates, and can be combined across multiple columns 
with AND/OR logic.
+- **Vector Index**: An approximate nearest neighbor (ANN) index powered by 
DiskANN for vector similarity search.
+
+Global indexes work on top of Data Evolution tables. To use global indexes, 
your table **must** have:
+
+- `'bucket' = '-1'` (unaware-bucket mode)
+- `'row-tracking.enabled' = 'true'`
+- `'data-evolution.enabled' = 'true'`
+
+## Prerequisites
+
+Create a table with the required properties:
+
+```sql
+CREATE TABLE my_table (
+    id INT,
+    name STRING,
+    embedding ARRAY<FLOAT>
+) TBLPROPERTIES (
+    'bucket' = '-1',
+    'row-tracking.enabled' = 'true',
+    'data-evolution.enabled' = 'true',
+    'global-index.enabled' = 'true'
+);
+```
+
+## BTree Index
+
+BTree index builds a logical B-tree structure over SST files, enabling 
efficient point lookups and range queries on scalar columns.
+
+**Build BTree Index**
+
+```sql
+-- Create BTree index on 'name' column
+CALL sys.create_global_index(
+    table => 'db.my_table',
+    index_column => 'name',
+    index_type => 'btree'
+);
+```
+
+**Query with BTree Index**
+
+Once a BTree index is built, it is automatically used during scan when a 
filter predicate matches the indexed column.
+
+```sql
+SELECT * FROM my_table WHERE name IN ('a200', 'a300');
+```
+
+## Vector Index
+
+Vector Index provides approximate nearest neighbor (ANN) search based on the 
DiskANN algorithm. It is suitable for
+vector similarity search scenarios such as recommendation systems, image 
retrieval, and RAG (Retrieval Augmented
+Generation) applications.
+
+**Build Vector Index**
+
+```sql
+-- Create Lumina vector index on 'embedding' column
+CALL sys.create_global_index(
+    table => 'db.my_table',
+    index_column => 'embedding',
+    index_type => 'lumina-vector-ann',
+    options => 'lumina.index.dimension=128'
+);
+```
+
+**Vector Search**
+
+{{< tabs "vector-search" >}}
+
+{{< tab "Spark SQL" >}}
+```sql
+-- Search for top-5 nearest neighbors
+SELECT * FROM vector_search('my_table', 'embedding', array(1.0f, 2.0f, 3.0f), 
5);
+```
+{{< /tab >}}
+
+{{< tab "Java API" >}}
+```java
+Table table = catalog.getTable(identifier);
+
+// Step 1: Build vector search
+float[] queryVector = {1.0f, 2.0f, 3.0f};
+GlobalIndexResult result = table.newVectorSearchBuilder()
+        .withVector(queryVector)
+        .withLimit(5)
+        .withVectorColumn("embedding")
+        .executeLocal();
+
+// Step 2: Read matching rows using the search result
+ReadBuilder readBuilder = table.newReadBuilder();
+TableScan.Plan plan = 
readBuilder.newScan().withGlobalIndexResult(result).plan();
+try (RecordReader<InternalRow> reader = 
readBuilder.newRead().createReader(plan)) {
+    reader.forEachRemaining(row -> {
+        System.out.println("id=" + row.getInt(0) + ", name=" + 
row.getString(1));
+    });
+}
+```
+{{< /tab >}}
+
+{{< /tabs >}}

(paimon) branch master updated: [doc] Add documentation for global index

Reply via email to