This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch site/external_indexes
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git

commit e37ec5c8d95e6c06f142bad7bcd56b47cf8d5840
Author: Andrew Lamb <and...@nerdnetworks.org>
AuthorDate: Fri Aug 8 10:11:41 2025 -0400

    hone
---
 .../blog/2025-08-15-external-parquet-indexes.md    | 37 +++++++++++-----------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/content/blog/2025-08-15-external-parquet-indexes.md 
b/content/blog/2025-08-15-external-parquet-indexes.md
index 3299a52..9ccb242 100644
--- a/content/blog/2025-08-15-external-parquet-indexes.md
+++ b/content/blog/2025-08-15-external-parquet-indexes.md
@@ -356,19 +356,19 @@ external index to find the files that may contain data 
that matches the query.
 [supports_filter_pushdown]: 
https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#method.supports_filters_pushdown
 [scan]: 
https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html#tymethod.scan
 
-The DataFusion repository contains a fully working and well commented
-[parquet_index.rs] example of using an external index to prune files based on a
-query predicate. The example demonstrates query that includes the predicate
-`value = 150`, and how the `IndexTableProvider` uses the index to determine
-that only two files are needed.
-
-[parquet_index.rs]: 
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/parquet_index.rs
+The DataFusion repository contains a fully working and well-commented example,
+[parquet_index.rs], of this technique that you can use as a starting point. 
+The example creates a simple index that stores the min/max values for a column
+called `value` along with the file name. Then it runs the following query:
 
 ```sql
 SELECT file_name, value FROM index_table WHERE value = 150
 ```
 
-The code from the example is as follows (slightly simplified for clarity):
+[parquet_index.rs]: 
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/parquet_index.rs
+
+The custom `IndexTableProvider`'s `scan` method uses the index to find files
+that may contain data matching the predicate as shown below:
 
 ```rust
 impl TableProvider for IndexTableProvider {
@@ -409,25 +409,24 @@ impl TableProvider for IndexTableProvider {
 }
 ```
 
-While this example uses a standard min/max index, you can implement any 
indexing
-strategy you need, such as a bloom filters, a full text index, or a more
-complex multi-dimensional index.
-
 DataFusion handles the details of pushing down the filters to the
-`TableProvider` and the mechanics of reading the parquet files, and you focus 
on
-the system specific details such as building, storing and applying the index.
+`TableProvider` and the mechanics of reading the parquet files, so you can 
focus
+on the system specific details such as building, storing and applying the 
index.
+While this example uses a standard min/max index, you can implement any 
indexing
+strategy you need, such as a bloom filters, a full text index, or a more 
complex
+multi-dimensional index.
 
-DataFusion also includes several libraries code to help you with common
-filtering tasks, such as:
+DataFusion also includes several libraries to help with common filtering and
+pruning tasks, such as:
 
 * A full and well documented expression representation ([Expr]) and [APIs for
   building, vistiting, and rewriting] query predicates
 
-* Range Based Pruning ([PruningPredicate]) for cases where your index stores 
min/max values for some/all columns.
+* Range Based Pruning ([PruningPredicate]) for cases where your index stores 
min/max values.
 
-* Expression simplification ([ExprSimplifier] for simplifying predicates 
before applying them to the index.
+* Expression simplification ([ExprSimplifier]) for simplifying predicates 
before applying them to the index.
 
-* Range analysis for predicates [cp_solver] for interval based range analysis 
(e.g. `col > 5 AND col < 10`)
+* Range analysis for predicates ([cp_solver]) for interval based range 
analysis (e.g. `col > 5 AND col < 10`)
 
 [Expr]: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html
 [APIs for building, vistiting, and rewriting]: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#visiting-and-rewriting-exprs


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

Reply via email to