This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/sedona.git
The following commit(s) were added to refs/heads/master by this push:
new 85e6107d2 [DOC] Update docs to explain the case of filtering after KNN
(#1575)
85e6107d2 is described below
commit 85e6107d20bd39920767bba4893f5a3c04b578c1
Author: Feng Zhang <[email protected]>
AuthorDate: Tue Sep 3 16:23:53 2024 -0700
[DOC] Update docs to explain the case of filtering after KNN (#1575)
---
docs/api/sql/NearestNeighbourSearching.md | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/docs/api/sql/NearestNeighbourSearching.md
b/docs/api/sql/NearestNeighbourSearching.md
index 224c63e44..bc65777cb 100644
--- a/docs/api/sql/NearestNeighbourSearching.md
+++ b/docs/api/sql/NearestNeighbourSearching.md
@@ -19,6 +19,30 @@ In case there are ties in the distance, the result will
include all the tied geo
spark.sedona.join.knn.includeTieBreakers=true
```
+Filter Pushdown Considerations:
+
+When using ST_KNN with filters applied to the resulting DataFrame, some of
these filters may be pushed down to the object side of the kNN join. This means
the filters will be applied to the object side reader before the kNN join is
executed. If you want the filters to be applied after the kNN join, ensure that
you first materialize the kNN join results and then apply the filters.
+
+For example, you can use the following approach:
+
+Scala Example:
+
+```
+val knnResult = knnJoinDF.cache()
+val filteredResult = knnResult.filter(condition)
+```
+
+SQL Example:
+
+```
+CREATE OR REPLACE TEMP VIEW knnResult AS
+SELECT * FROM (
+ -- Your KNN join SQL here
+) AS knnView;
+CACHE TABLE knnResult;
+SELECT * FROM knnResult WHERE condition;
+```
+
SQL Example
Suppose we have two tables `QUERIES` and `OBJECTS` with the following data: