(paimon) branch master updated: [doc] Add 'Data Skipping By File Index' for primary key table

lzljs3620320 Thu, 21 Aug 2025 22:27:09 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new b530c83d00 [doc] Add 'Data Skipping By File Index' for primary key 
table
b530c83d00 is described below

commit b530c83d00e39948c579c8e608e3045aefad072b
Author: JingsongLi <jingsongl...@gmail.com>
AuthorDate: Fri Aug 22 13:26:02 2025 +0800

    [doc] Add 'Data Skipping By File Index' for primary key table
---
 docs/content/append-table/query-performance.md     |  9 +--------
 .../content/primary-key-table/query-performance.md | 23 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/docs/content/append-table/query-performance.md 
b/docs/content/append-table/query-performance.md
index 101970e643..dbc80a1d35 100644
--- a/docs/content/append-table/query-performance.md
+++ b/docs/content/append-table/query-performance.md
@@ -60,14 +60,7 @@ You can take a look at [Flink COMPACT Action]({{< ref 
"maintenance/dedicated-com
 
 You can use file index too, it filters files by indexing on the reading side.
 
-```sql
-CREATE TABLE <PAIMON_TABLE> (<COLUMN> <COLUMN_TYPE> , ...) WITH (
-    'file-index.bloom-filter.columns' = 'c1,c2',
-    'file-index.bloom-filter.c1.items' = '200'
-);
-```
-
-Define `file-index.bloom-filter.columns`, Data file index is an external index 
file and Paimon will create its
+Define `file-index.bitmap.columns`, Data file index is an external index file 
and Paimon will create its
 corresponding index file for each file. If the index file is too small, it 
will be stored directly in the manifest,
 otherwise in the directory of the data file. Each data file corresponds to an 
index file, which has a separate file
 definition and can contain different types of indexes with multiple columns.
diff --git a/docs/content/primary-key-table/query-performance.md 
b/docs/content/primary-key-table/query-performance.md
index 2ba19b0d3d..7310103307 100644
--- a/docs/content/primary-key-table/query-performance.md
+++ b/docs/content/primary-key-table/query-performance.md
@@ -59,6 +59,29 @@ Min max query can be also accelerated during compilation and 
returns very quickl
 For a regular bucketed table (For example, bucket = 5), the filtering 
conditions of the primary key will greatly
 accelerate queries and reduce the reading of a large number of files.
 
+## Data Skipping By File Index
+
+For full-compacted file, or for primary-key table with 
`'deletion-vectors.enabled'`, you can use file index, it filters
+files by indexing on the reading side.
+
+Define `file-index.bitmap.columns`, Data file index is an external index file 
and Paimon will create its
+corresponding index file for each file. If the index file is too small, it 
will be stored directly in the manifest,
+otherwise in the directory of the data file. Each data file corresponds to an 
index file, which has a separate file
+definition and can contain different types of indexes with multiple columns.
+
+Different file indexes may be efficient in different scenarios. For example 
bloom filter may speed up query in point lookup
+scenario. Using a bitmap may consume more space but can result in greater 
accuracy.
+
+* [BloomFilter]({{< ref "concepts/spec/fileindex#index-bloomfilter" >}}): 
`file-index.bloom-filter.columns`.
+* [Bitmap]({{< ref "concepts/spec/fileindex#index-bitmap" >}}): 
`file-index.bitmap.columns`.
+* [Range Bitmap]({{< ref "concepts/spec/fileindex#index-range-bitmap" >}}): 
`file-index.range-bitmap.columns`.
+
+If you want to add file index to existing table, without any rewrite, you can 
use `rewrite_file_index` procedure. Before
+we use the procedure, you should config appropriate configurations in target 
table. You can use ALTER clause to config
+`file-index.<filter-type>.columns` to the table.
+
+How to invoke: see [flink procedures]({{< ref "flink/procedures#procedures" 
>}})
+
 ## Bucketed Join
 
 Fixed Bucketed table (e.g. bucket = 10) can be used to avoid shuffle if 
necessary in batch query, for example, you can

(paimon) branch master updated: [doc] Add 'Data Skipping By File Index' for primary key table

Reply via email to