(paimon) branch master updated: [doc] Optimize doc for file index

lzljs3620320 Sun, 17 Aug 2025 22:02:08 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new 53b3c2c62f [doc] Optimize doc for file index
53b3c2c62f is described below

commit 53b3c2c62f18f99b2b06a052ce278f290448d363
Author: JingsongLi <jingsongl...@gmail.com>
AuthorDate: Mon Aug 18 13:01:32 2025 +0800

    [doc] Optimize doc for file index
---
 docs/content/append-table/query-performance.md     | 50 ++--------------------
 docs/content/concepts/spec/fileindex.md            | 39 ++++++++++++++++-
 .../apache/paimon/spark/PaimonScanBuilder.scala    |  4 --
 3 files changed, 41 insertions(+), 52 deletions(-)

diff --git a/docs/content/append-table/query-performance.md 
b/docs/content/append-table/query-performance.md
index 7c15455b3d..101970e643 100644
--- a/docs/content/append-table/query-performance.md
+++ b/docs/content/append-table/query-performance.md
@@ -75,53 +75,9 @@ definition and can contain different types of indexes with 
multiple columns.
 Different file indexes may be efficient in different scenarios. For example 
bloom filter may speed up query in point lookup
 scenario. Using a bitmap may consume more space but can result in greater 
accuracy.
 
-`Bloom Filter`:
-* `file-index.bloom-filter.columns`: specify the columns that need bloom 
filter index.
-* `file-index.bloom-filter.<column_name>.fpp` to config false positive 
probability.
-* `file-index.bloom-filter.<column_name>.items` to config the expected 
distinct items in one data file.
-
-`Bitmap`:
-* `file-index.bitmap.columns`: specify the columns that need bitmap index. See 
[Index Bitmap]({{< ref "concepts/spec/fileindex#index-bitmap" >}}).
-
-`Range Bitmap Index Bitmap`
-* `file-index.range-bitmap.columns`: specify the columns that need 
range-bitmap index. See [Index Range Bitmap]({{< ref 
"concepts/spec/fileindex#index-range-bitmap" >}}).
-
-
-Append Table supports using range-bitmap file index to optimize the `EQUALS`, 
`RANGE`, `AND/OR` and `TOPN` predicate. The bitmap and range-bitmap file index 
result will be merged and pushed down to the DataFile for filtering rowgroups 
and pages.
-
-In the following query examples, the `class_id` and the `score` has been 
created with range-bitmap file index. And the partition key `dt` is not 
necessary.
-
-**Optimize the `EQUALS` predicate:**
-```sql
-SELECT * FROM TABLE WHERE dt = '20250801' AND score = 100;
-
-SELECT * FROM TABLE WHERE dt = '20250801' AND score IN (60, 80);
-```
-
-**Optimize the `RANGE` predicate:**
-```sql
-SELECT * FROM TABLE WHERE dt = '20250801' AND score > 60;
-
-SELECT * FROM TABLE WHERE dt = '20250801' AND score < 60;
-```
-
-**Optimize the `AND/OR` predicate:**
-```sql
-SELECT * FROM TABLE WHERE dt = '20250801' AND class_id = 1 AND score < 60;
-
-SELECT * FROM TABLE WHERE dt = '20250801' AND class_id = 1 AND score < 60 OR 
score > 80;
-```
-
-**Optimize the `TOPN` predicate:**
-
-For now, the `TOPN` predicate optimization can not using with other 
predicates, only support in Apache Spark.
-```sql
-SELECT * FROM TABLE WHERE dt = '20250801' ORDER BY score ASC LIMIT 10;
-
-SELECT * FROM TABLE WHERE dt = '20250801' ORDER BY score DESC LIMIT 10;
-```
-
-More filter types will be supported...
+* [BloomFilter]({{< ref "concepts/spec/fileindex#index-bloomfilter" >}}): 
`file-index.bloom-filter.columns`.
+* [Bitmap]({{< ref "concepts/spec/fileindex#index-bitmap" >}}): 
`file-index.bitmap.columns`.
+* [Range Bitmap]({{< ref "concepts/spec/fileindex#index-range-bitmap" >}}): 
`file-index.range-bitmap.columns`.
 
 If you want to add file index to existing table, without any rewrite, you can 
use `rewrite_file_index` procedure. Before
 we use the procedure, you should config appropriate configurations in target 
table. You can use ALTER clause to config
diff --git a/docs/content/concepts/spec/fileindex.md 
b/docs/content/concepts/spec/fileindex.md
index b041139a8e..76ded93945 100644
--- a/docs/content/concepts/spec/fileindex.md
+++ b/docs/content/concepts/spec/fileindex.md
@@ -85,7 +85,10 @@ BODY:                             column index bytes + 
column index bytes + colu
 
 ## Index: BloomFilter 
 
-Define `'file-index.bloom-filter.columns'`.
+Options are:
+* `file-index.bloom-filter.columns`: specify the columns that need bloom 
filter index.
+* `file-index.bloom-filter.<column_name>.fpp` to config false positive 
probability.
+* `file-index.bloom-filter.<column_name>.items` to config the expected 
distinct items in one data file.
 
 Content of bloom filter index is simple: 
 - numHashFunctions 4 bytes int, BIG_ENDIAN
@@ -232,6 +235,40 @@ Options:
 * `file-index.range-bitmap.columns`: specify the columns that need 
range-bitmap index.
 * `file-index.range-bitmap.<column_name>.chunk-size`: to config the chunk 
size, default value is 16kb.
 
+Table supports using range-bitmap file index to optimize the `EQUALS`, 
`RANGE`, `AND/OR` and `TOPN` predicate. The bitmap and range-bitmap file index 
result will be merged and pushed down to the DataFile for filtering rowgroups 
and pages.
+
+In the following query examples, the `class_id` and the `score` has been 
created with range-bitmap file index. And the partition key `dt` is not 
necessary.
+
+**Optimize the `EQUALS` predicate:**
+```sql
+SELECT * FROM TABLE WHERE dt = '20250801' AND score = 100;
+
+SELECT * FROM TABLE WHERE dt = '20250801' AND score IN (60, 80);
+```
+
+**Optimize the `RANGE` predicate:**
+```sql
+SELECT * FROM TABLE WHERE dt = '20250801' AND score > 60;
+
+SELECT * FROM TABLE WHERE dt = '20250801' AND score < 60;
+```
+
+**Optimize the `AND/OR` predicate:**
+```sql
+SELECT * FROM TABLE WHERE dt = '20250801' AND class_id = 1 AND score < 60;
+
+SELECT * FROM TABLE WHERE dt = '20250801' AND class_id = 1 AND score < 60 OR 
score > 80;
+```
+
+**Optimize the `TOPN` predicate:**
+
+For now, the `TOPN` predicate optimization can not using with other 
predicates, only support in Apache Spark.
+```sql
+SELECT * FROM TABLE WHERE dt = '20250801' ORDER BY score ASC LIMIT 10;
+
+SELECT * FROM TABLE WHERE dt = '20250801' ORDER BY score DESC LIMIT 10;
+```
+
 <pre>
 Range Bitmap file index format (V1)
 +-------------------------------------------------+-----------------
diff --git 
a/paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/PaimonScanBuilder.scala
 
b/paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/PaimonScanBuilder.scala
index 899c204718..729613f596 100644
--- 
a/paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/PaimonScanBuilder.scala
+++ 
b/paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/PaimonScanBuilder.scala
@@ -116,10 +116,6 @@ class PaimonScanBuilder(table: InnerTable)
     }
 
     val order = orders(0)
-    if (!order.expression().isInstanceOf[NamedReference]) {
-      return false
-    }
-
     val fieldName = orders.head.expression() match {
       case nr: NamedReference => nr.fieldNames.mkString(".")
       case _ => return false

(paimon) branch master updated: [doc] Optimize doc for file index

Reply via email to