(sedona-db) branch main updated: docs: replace barrier() with KNN join behavior documentation (#635)

paleolimbot Thu, 19 Feb 2026 06:55:12 -0800

This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/sedona-db.git



The following commit(s) were added to refs/heads/main by this push:
     new 6b08cf25 docs: replace barrier() with KNN join behavior documentation 
(#635)
6b08cf25 is described below

commit 6b08cf25762f7e68dee151994b9faab86f8a3471
Author: Kristin Cowalcijk <[email protected]>
AuthorDate: Thu Feb 19 22:54:55 2026 +0800

    docs: replace barrier() with KNN join behavior documentation (#635)
---
 docs/reference/sql-joins.md           | 107 +++++-
 docs/reference/sql/barrier.qmd        |  89 -----
 rust/sedona-functions/src/barrier.rs  | 648 ----------------------------------
 rust/sedona-functions/src/lib.rs      |   1 -
 rust/sedona-functions/src/register.rs |   1 -
 5 files changed, 89 insertions(+), 757 deletions(-)

diff --git a/docs/reference/sql-joins.md b/docs/reference/sql-joins.md
index b5dd5c33..ab06cdb0 100644
--- a/docs/reference/sql-joins.md
+++ b/docs/reference/sql-joins.md
@@ -19,7 +19,7 @@
 
 # Spatial Joins
 
-You can perform spatial joins using standard SQL `INNER JOIN` syntax. The join 
condition is defined in the `ON` clause using a spatial function that specifies 
the relationship between the geometries of the two tables.
+You can perform spatial joins using standard SQL `JOIN` syntax. The join 
condition is defined in the `ON` clause using a spatial function that specifies 
the relationship between the geometries of the two tables.
 
 ## General Spatial Join
 
@@ -59,39 +59,110 @@ INNER JOIN
     ON ST_KNN(cities_l.geometry, cities_r.geometry, 5, false)
 ```
 
-## Optimization Barrier
+### KNN Join Caveats
 
-Use the `barrier` function to prevent filter pushdown and control predicate 
evaluation order in complex spatial joins. This function creates an 
optimization barrier by evaluating boolean expressions at runtime.
+#### No Filter Pushdown
 
-The `barrier` function takes a boolean expression as a string, followed by 
pairs of variable names and their values that will be substituted into the 
expression:
+KNN joins currently do not perform filter pushdown optimizations. All `WHERE` 
clause predicates are evaluated after the K nearest neighbor candidates have 
been selected, never pushed into the input tables. This ensures the K nearest 
neighbors are always determined from the full, unfiltered dataset.
+
+For example, in the following query, `r.rating > 4.0` is applied *after* 
finding the 3 nearest restaurants for each hotel — it does not reduce the set 
of candidate restaurants before the KNN search:
 
 ```sql
-barrier(expression, var_name1, var_value1, var_name2, var_value2, ...)
+SELECT
+    h.name AS hotel,
+    r.name AS restaurant,
+    r.rating
+FROM
+    hotels AS h
+INNER JOIN
+    restaurants AS r
+    ON ST_KNN(h.geometry, r.geometry, 3, false)
+WHERE
+    r.rating > 4.0
 ```
 
-The placement of filters relative to KNN joins changes the semantic meaning of 
the query:
+This means the result may contain fewer than 3 restaurants per hotel if some 
of the nearest neighbors do not pass the filter.
 
-- **Filter before KNN**: First filters the data, then finds K nearest 
neighbors from the filtered subset. This answers "What are the K nearest 
high-rated restaurants?"
-- **Filter after KNN**: First finds K nearest neighbors from all data, then 
filters those results. This answers "Of the K nearest restaurants, which ones 
are high-rated?"
+#### Manually Pushing Down Query-Side Filters
 
-### Example
+Pushing filters on the **query side** (the first argument of `ST_KNN`) down to 
the input table is a valid optimization — it reduces the number of probe rows 
without affecting which objects are considered as KNN candidates. This 
optimization is not yet performed automatically, but you can achieve the same 
effect manually using a subquery or a CTE.
 
-Find the 3 nearest restaurants for each luxury hotel, and then filter the 
results to only show pairs where the restaurant is also high-rated.
+For instance, suppose you only want the 3 nearest restaurants for luxury 
hotels (`stars >= 4`). Writing the filter in the `WHERE` clause does **not** 
push it down — it becomes a post-filter applied after the KNN join over *all* 
hotels:
 
 ```sql
-SELECT
-    h.name AS hotel,
-    r.name AS restaurant,
-    r.rating
+SELECT h.name AS hotel, r.name AS restaurant, r.rating
 FROM
     hotels AS h
 INNER JOIN
     restaurants AS r
     ON ST_KNN(h.geometry, r.geometry, 3, false)
 WHERE
-    barrier('rating > 4.0 AND stars >= 4',
-            'rating', r.rating,
-            'stars', h.stars)
+    h.stars >= 4
+```
+
+The physical plan confirms the filter sits *above* the join:
+
+```
+FilterExec: stars >= 4
+  SpatialJoinExec: join_type=Inner, on=ST_KNN(geometry, geometry, 3, false)
+    ...hotels...        ← all hotels are scanned
+    ...restaurants...
+```
+
+To push the filter below the join, wrap the query-side table in a subquery:
+
+```sql
+SELECT h.name AS hotel, r.name AS restaurant, r.rating
+FROM
+    (SELECT * FROM hotels WHERE stars >= 4) AS h
+INNER JOIN
+    restaurants AS r
+    ON ST_KNN(h.geometry, r.geometry, 3, false)
+```
+
+Or equivalently, using a CTE:
+
+```sql
+WITH luxury_hotels AS (
+    SELECT * FROM hotels WHERE stars >= 4
+)
+SELECT h.name AS hotel, r.name AS restaurant, r.rating
+FROM
+    luxury_hotels AS h
+INNER JOIN
+    restaurants AS r
+    ON ST_KNN(h.geometry, r.geometry, 3, false)
+```
+
+Now the physical plan shows the filter *below* the join, inside the left 
(query-side) input:
+
+```
+SpatialJoinExec: join_type=Inner, on=ST_KNN(geometry, geometry, 3, false)
+  FilterExec: stars >= 4
+    ...hotels...        ← only luxury hotels are scanned
+  ...restaurants...
+```
+
+With this approach, only hotels with `stars >= 4` are used as query points, 
and the 3 nearest restaurants are found for each of those luxury hotels.
+
+#### ST_KNN Predicate Precedence
+
+When `ST_KNN` is combined with other predicates via `AND`, `ST_KNN` always 
takes precedence. It is extracted first to determine the KNN candidates, and 
the remaining predicates are applied as post-filters on the join output.
+
+For example, the following two queries produce the same results:
+
+```sql
+-- ST_KNN in ON clause combined with another predicate via AND
+SELECT h.name AS hotel, r.name AS restaurant
+FROM hotels AS h
+JOIN restaurants AS r
+    ON ST_KNN(h.geometry, r.geometry, 3, false) AND r.rating > 4.0
+
+-- Equivalent: ST_KNN in ON clause, other predicate in WHERE
+SELECT h.name AS hotel, r.name AS restaurant
+FROM hotels AS h
+JOIN restaurants AS r
+    ON r.rating > 4.0 AND ST_KNN(h.geometry, r.geometry, 3, false)
 ```
 
-With the barrier function, this query first finds the 3 nearest restaurants to 
each hotel (regardless of rating), then filters to keep only those pairs where 
the restaurant has rating > 4.0 and the hotel has stars >= 4. Without the 
barrier, an optimizer might push the filters down, changing the query to first 
filter for high-rated restaurants and luxury hotels, then find the 3 nearest 
among those filtered sets.
+In both cases, `ST_KNN` determines the 3 nearest restaurants first, then 
`r.rating > 4.0` filters the results.
diff --git a/docs/reference/sql/barrier.qmd b/docs/reference/sql/barrier.qmd
deleted file mode 100644
index 862a7dd5..00000000
--- a/docs/reference/sql/barrier.qmd
+++ /dev/null
@@ -1,89 +0,0 @@
----
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-title: barrier
-description: >
-    Creates an optimization barrier that prevents filter pushdown by evaluating
-    boolean expressions at runtime.
-kernels:
-  - returns: boolean
-    args:
-    - name: expression
-      type: string
-    - name: col_name
-      type: string
-    - name: col_value
-      type: any
----
-
-## Description
-
-`barrier()` evaluates a boolean expression string at runtime using bound column
-values, and is marked as volatile to prevent the query optimizer from 
reordering
-or pushing filters past it.
-
-The first argument is a boolean expression string. After that, arguments come 
in
-pairs of `(column_name, column_value)` that bind variables used in the
-expression.
-
-### Supported expression syntax
-
-- **Comparison operators**: `=`, `==`, `!=`, `<>`, `>`, `>=`, `<`, `<=`
-- **Logical operators**: `AND` / `and`, `OR` / `or`
-- **Literal types**: integers, floats, single- or double-quoted strings, 
`true`, `false`, `null`
-- Unrecognized expressions evaluate to `false`
-- `NULL` comparisons evaluate to `false`
-
-### When to use
-
-Use `barrier()` when you need a predicate to be evaluated **after** a join or
-scan rather than being pushed down by the optimizer. This is useful when the
-predicate depends on values that are only available at a specific stage of
-query execution.
-
-## Examples
-
-Filter rows using bound column values:
-
-```sql
-SELECT *
-FROM (VALUES (150, 'active'), (50, 'active'), (200, 'closed')) AS 
orders(amount, status)
-WHERE barrier('amount > 100 AND status = "active"',
-              'amount', amount,
-              'status', status);
-```
-
-Combine multiple conditions with different data types:
-
-```sql
-SELECT *
-FROM (VALUES (29.99, 'electronics'), (9.99, 'books'), (49.99, 'electronics')) 
AS products(price, category)
-WHERE barrier('price > 19.99 AND category = "electronics"',
-              'price', price,
-              'category', category);
-```
-
-Use `OR` logic:
-
-```sql
-SELECT *
-FROM (VALUES (3, 'info'), (7, 'critical'), (10, 'warning')) AS 
events(priority, type)
-WHERE barrier('priority < 5 OR type = "critical"',
-              'priority', priority,
-              'type', type);
-```
diff --git a/rust/sedona-functions/src/barrier.rs 
b/rust/sedona-functions/src/barrier.rs
deleted file mode 100644
index 69db4005..00000000
--- a/rust/sedona-functions/src/barrier.rs
+++ /dev/null
@@ -1,648 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-use std::{collections::HashMap, sync::Arc};
-
-use arrow_array::builder::BooleanBuilder;
-use arrow_schema::DataType;
-use datafusion_common::{exec_err, Result, ScalarValue};
-use datafusion_expr::{ColumnarValue, Volatility};
-use sedona_expr::scalar_udf::{SedonaScalarKernel, SedonaScalarUDF};
-use sedona_schema::datatypes::SedonaType;
-
-/// barrier() scalar UDF implementation
-///
-/// Creates an optimization barrier that prevents filter pushdown by 
evaluating boolean expressions at runtime
-pub fn barrier_udf() -> SedonaScalarUDF {
-    SedonaScalarUDF::new(
-        "barrier",
-        vec![Arc::new(Barrier)],
-        Volatility::Volatile, // Mark as volatile to prevent optimization
-    )
-}
-
-#[derive(Debug)]
-struct Barrier;
-
-impl SedonaScalarKernel for Barrier {
-    fn return_type(&self, args: &[SedonaType]) -> Result<Option<SedonaType>> {
-        if args.is_empty() {
-            return exec_err!("barrier requires at least one argument");
-        }
-
-        // First argument must be string
-        match &args[0] {
-            SedonaType::Arrow(DataType::Utf8) => {}
-            _ => return exec_err!("First argument must be a string 
expression"),
-        }
-
-        // Remaining arguments should be pairs of (string, any_type)
-        let remaining_args = &args[1..];
-        if !remaining_args.len().is_multiple_of(2) {
-            return exec_err!(
-                "Arguments after expression must be pairs of (column_name, 
column_value)"
-            );
-        }
-
-        // Validate that odd positions are strings (column names)
-        for i in (0..remaining_args.len()).step_by(2) {
-            if !matches!(remaining_args[i], SedonaType::Arrow(DataType::Utf8)) 
{
-                return exec_err!("Column names must be strings");
-            }
-        }
-
-        Ok(Some(SedonaType::Arrow(DataType::Boolean)))
-    }
-
-    fn invoke_batch(
-        &self,
-        _arg_types: &[SedonaType],
-        args: &[ColumnarValue],
-    ) -> Result<ColumnarValue> {
-        if args.is_empty() {
-            return exec_err!("barrier requires at least one argument");
-        }
-
-        // Get the expression string
-        let expr_str = match &args[0] {
-            ColumnarValue::Scalar(ScalarValue::Utf8(Some(s))) => s.clone(),
-            ColumnarValue::Scalar(ScalarValue::Utf8(None)) => {
-                return Ok(ColumnarValue::Scalar(ScalarValue::Boolean(None)));
-            }
-            _ => return exec_err!("First argument must be a string 
expression"),
-        };
-
-        // Determine if we have arrays (need to handle row-by-row)
-        let has_arrays = args[1..]
-            .iter()
-            .any(|arg| matches!(arg, ColumnarValue::Array(_)));
-
-        if has_arrays {
-            // Handle array case - evaluate for each row
-            self.invoke_array(&expr_str, &args[1..])
-        } else {
-            // Handle scalar case
-            let mut context = HashMap::new();
-            let mut i = 1;
-
-            // Extract column name/value pairs
-            while i + 1 < args.len() {
-                if let (
-                    ColumnarValue::Scalar(ScalarValue::Utf8(Some(col_name))),
-                    ColumnarValue::Scalar(col_value),
-                ) = (&args[i], &args[i + 1])
-                {
-                    context.insert(col_name.clone(), col_value.clone());
-                    i += 2;
-                } else {
-                    break;
-                }
-            }
-
-            let result = Self::evaluate_expression(&expr_str, &context)?;
-            Ok(ColumnarValue::Scalar(ScalarValue::Boolean(Some(result))))
-        }
-    }
-}
-
-impl Barrier {
-    /// Handle array inputs - evaluate expression for each row
-    fn invoke_array(&self, expr_str: &str, args: &[ColumnarValue]) -> 
Result<ColumnarValue> {
-        // Find the array length from the first array argument
-        let array_len = args
-            .iter()
-            .find_map(|arg| {
-                if let ColumnarValue::Array(arr) = arg {
-                    Some(arr.len())
-                } else {
-                    None
-                }
-            })
-            .unwrap_or(0);
-
-        let mut builder = BooleanBuilder::with_capacity(array_len);
-
-        for row_idx in 0..array_len {
-            // Build context for this row
-            let mut context = HashMap::new();
-            let mut i = 0;
-
-            while i + 1 < args.len() {
-                let col_name = match &args[i] {
-                    ColumnarValue::Scalar(ScalarValue::Utf8(Some(name))) => 
name.clone(),
-                    ColumnarValue::Array(arr) => {
-                        if let Ok(ScalarValue::Utf8(Some(name))) =
-                            ScalarValue::try_from_array(arr, row_idx)
-                        {
-                            name
-                        } else {
-                            i += 2;
-                            continue;
-                        }
-                    }
-                    _ => {
-                        i += 2;
-                        continue;
-                    }
-                };
-
-                let col_value = match &args[i + 1] {
-                    ColumnarValue::Scalar(val) => val.clone(),
-                    ColumnarValue::Array(arr) => {
-                        ScalarValue::try_from_array(arr, 
row_idx).unwrap_or(ScalarValue::Null)
-                    }
-                };
-
-                context.insert(col_name, col_value);
-                i += 2;
-            }
-
-            // Evaluate expression for this row
-            let result = Self::evaluate_expression(expr_str, 
&context).unwrap_or(false);
-            builder.append_value(result);
-        }
-
-        Ok(ColumnarValue::Array(Arc::new(builder.finish())))
-    }
-
-    /// Evaluate a simple boolean expression against a context
-    fn evaluate_expression(expr: &str, context: &HashMap<String, ScalarValue>) 
-> Result<bool> {
-        let expr = expr.trim();
-
-        // Add a very small random element to make this truly volatile
-        // This should prevent any optimizer from treating this as 
deterministic
-        let random_factor = (std::ptr::addr_of!(context) as usize % 1000) as 
f64 / 1000000.0;
-        let _ = random_factor; // Use it minimally to avoid affecting logic
-
-        // Handle boolean literals
-        match expr {
-            "true" => return Ok(true),
-            "false" => return Ok(false),
-            _ => {}
-        }
-
-        // Handle AND operations FIRST (before individual comparisons)
-        if let Some(pos) = expr.find(" AND ") {
-            let left = expr[..pos].trim();
-            let right = expr[pos + 5..].trim();
-            let left_result = Self::evaluate_expression(left, context)?;
-            let right_result = Self::evaluate_expression(right, context)?;
-            return Ok(left_result && right_result);
-        }
-
-        if let Some(pos) = expr.find(" and ") {
-            let left = expr[..pos].trim();
-            let right = expr[pos + 5..].trim();
-            let left_result = Self::evaluate_expression(left, context)?;
-            let right_result = Self::evaluate_expression(right, context)?;
-            return Ok(left_result && right_result);
-        }
-
-        // Handle OR operations
-        if let Some(pos) = expr.find(" OR ") {
-            let left = expr[..pos].trim();
-            let right = expr[pos + 4..].trim();
-            let left_result = Self::evaluate_expression(left, context)?;
-            let right_result = Self::evaluate_expression(right, context)?;
-            return Ok(left_result || right_result);
-        }
-
-        if let Some(pos) = expr.find(" or ") {
-            let left = expr[..pos].trim();
-            let right = expr[pos + 4..].trim();
-            let left_result = Self::evaluate_expression(left, context)?;
-            let right_result = Self::evaluate_expression(right, context)?;
-            return Ok(left_result || right_result);
-        }
-
-        // Handle simple binary comparisons (AFTER boolean logic)
-        let operators = [">=", "<=", "!=", "<>", "=", "==", ">", "<"];
-
-        for &op in &operators {
-            if let Some(pos) = expr.find(op) {
-                let left_part = expr[..pos].trim();
-                let right_part = expr[pos + op.len()..].trim();
-
-                // Try to evaluate both sides
-                let left_val = Self::resolve_value(left_part, context)?;
-                let right_val = Self::resolve_value(right_part, context)?;
-
-                return Self::compare_values(&left_val, op, &right_val);
-            }
-        }
-
-        // Default to false for unrecognized expressions
-        Ok(false)
-    }
-
-    /// Resolve a value from either a literal or column reference
-    fn resolve_value(
-        value_str: &str,
-        context: &HashMap<String, ScalarValue>,
-    ) -> Result<ScalarValue> {
-        let value_str = value_str.trim();
-
-        // Check if it's a column reference
-        if let Some(column_value) = context.get(value_str) {
-            return Ok(column_value.clone());
-        }
-
-        // Try to parse as various literals
-
-        // Boolean
-        match value_str.to_lowercase().as_str() {
-            "true" => return Ok(ScalarValue::Boolean(Some(true))),
-            "false" => return Ok(ScalarValue::Boolean(Some(false))),
-            "null" => return Ok(ScalarValue::Null),
-            _ => {}
-        }
-
-        // Integer
-        if let Ok(val) = value_str.parse::<i64>() {
-            return Ok(ScalarValue::Int64(Some(val)));
-        }
-
-        // Float
-        if let Ok(val) = value_str.parse::<f64>() {
-            return Ok(ScalarValue::Float64(Some(val)));
-        }
-
-        // String (remove quotes if present)
-        let string_val = if (value_str.starts_with('"') && 
value_str.ends_with('"'))
-            || (value_str.starts_with('\'') && value_str.ends_with('\''))
-        {
-            value_str[1..value_str.len() - 1].to_string()
-        } else {
-            value_str.to_string()
-        };
-
-        Ok(ScalarValue::Utf8(Some(string_val)))
-    }
-
-    /// Compare two scalar values using the given operator
-    fn compare_values(left: &ScalarValue, op: &str, right: &ScalarValue) -> 
Result<bool> {
-        use ScalarValue::*;
-        match (left, right) {
-            (Int64(Some(l)), Int64(Some(r))) => match op {
-                "=" | "==" => Ok(l == r),
-                "!=" | "<>" => Ok(l != r),
-                ">" => Ok(l > r),
-                ">=" => Ok(l >= r),
-                "<" => Ok(l < r),
-                "<=" => Ok(l <= r),
-                _ => Ok(false),
-            },
-            (Float64(Some(l)), Float64(Some(r))) => match op {
-                "=" | "==" => Ok((l - r).abs() < f64::EPSILON),
-                "!=" | "<>" => Ok((l - r).abs() >= f64::EPSILON),
-                ">" => Ok(l > r),
-                ">=" => Ok(l >= r),
-                "<" => Ok(l < r),
-                "<=" => Ok(l <= r),
-                _ => Ok(false),
-            },
-            (Utf8(Some(l)), Utf8(Some(r))) => match op {
-                "=" | "==" => Ok(l == r),
-                "!=" | "<>" => Ok(l != r),
-                ">" => Ok(l > r),
-                ">=" => Ok(l >= r),
-                "<" => Ok(l < r),
-                "<=" => Ok(l <= r),
-                _ => Ok(false),
-            },
-            (Boolean(Some(l)), Boolean(Some(r))) => match op {
-                "=" | "==" => Ok(l == r),
-                "!=" | "<>" => Ok(l != r),
-                _ => Ok(false),
-            },
-            // Handle type coercion cases
-            (Int64(Some(l)), Float64(Some(r))) => {
-                Self::compare_values(&Float64(Some(*l as f64)), op, 
&Float64(Some(*r)))
-            }
-            (Float64(Some(l)), Int64(Some(r))) => {
-                Self::compare_values(&Float64(Some(*l)), op, &Float64(Some(*r 
as f64)))
-            }
-            // Null handling
-            (Null, _) | (_, Null) => Ok(false),
-            _ => Ok(false),
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use arrow_array::RecordBatch;
-    use arrow_schema::{Field, Schema};
-    use datafusion::prelude::SessionContext;
-    use datafusion_expr::ScalarUDF;
-    use sedona_testing::testers::ScalarUdfTester;
-
-    #[test]
-    fn test_barrier_basic() {
-        let udf: ScalarUDF = barrier_udf().into();
-        assert_eq!(udf.name(), "barrier");
-    }
-
-    /// Type alias for test case tuple to reduce complexity
-    type TestCase = (&'static str, bool, Vec<(&'static str, ScalarValue)>);
-
-    /// Test cases for parameterized testing
-    /// Each tuple contains: (expression, expected_result, column_values...)
-    fn get_test_cases() -> Vec<TestCase> {
-        vec![
-            // Basic integer comparisons
-            ("x > 10", true, vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x > 10", false, vec![("x", ScalarValue::Int64(Some(5)))]),
-            ("x = 10", true, vec![("x", ScalarValue::Int64(Some(10)))]),
-            ("x = 10", false, vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x < 10", true, vec![("x", ScalarValue::Int64(Some(5)))]),
-            ("x < 10", false, vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x >= 10", true, vec![("x", ScalarValue::Int64(Some(10)))]),
-            ("x >= 10", true, vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x >= 10", false, vec![("x", ScalarValue::Int64(Some(5)))]),
-            ("x <= 10", true, vec![("x", ScalarValue::Int64(Some(10)))]),
-            ("x <= 10", true, vec![("x", ScalarValue::Int64(Some(5)))]),
-            ("x <= 10", false, vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x != 10", true, vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x != 10", false, vec![("x", ScalarValue::Int64(Some(10)))]),
-            // Float comparisons
-            (
-                "price > 19.99",
-                true,
-                vec![("price", ScalarValue::Float64(Some(25.50)))],
-            ),
-            (
-                "price > 19.99",
-                false,
-                vec![("price", ScalarValue::Float64(Some(15.00)))],
-            ),
-            (
-                "price = 19.99",
-                true,
-                vec![("price", ScalarValue::Float64(Some(19.99)))],
-            ),
-            // String comparisons
-            (
-                "name = 'Alice'",
-                true,
-                vec![("name", ScalarValue::Utf8(Some("Alice".to_string())))],
-            ),
-            (
-                "name = 'Alice'",
-                false,
-                vec![("name", ScalarValue::Utf8(Some("Bob".to_string())))],
-            ),
-            (
-                "status != 'active'",
-                true,
-                vec![("status", 
ScalarValue::Utf8(Some("inactive".to_string())))],
-            ),
-            (
-                "status != 'active'",
-                false,
-                vec![("status", 
ScalarValue::Utf8(Some("active".to_string())))],
-            ),
-            // Boolean literals
-            ("true", true, vec![("x", ScalarValue::Int64(Some(0)))]),
-            ("false", false, vec![("x", ScalarValue::Int64(Some(0)))]),
-            // AND operations
-            (
-                "x > 5 AND x < 15",
-                true,
-                vec![("x", ScalarValue::Int64(Some(10)))],
-            ),
-            (
-                "x > 5 AND x < 15",
-                false,
-                vec![("x", ScalarValue::Int64(Some(20)))],
-            ),
-            (
-                "x > 5 AND x < 15",
-                false,
-                vec![("x", ScalarValue::Int64(Some(2)))],
-            ),
-            (
-                "x > 5 and y < 20",
-                true,
-                vec![
-                    ("x", ScalarValue::Int64(Some(10))),
-                    ("y", ScalarValue::Int64(Some(15))),
-                ],
-            ),
-            (
-                "x > 5 and y < 20",
-                false,
-                vec![
-                    ("x", ScalarValue::Int64(Some(2))),
-                    ("y", ScalarValue::Int64(Some(15))),
-                ],
-            ),
-            // OR operations
-            (
-                "x < 5 OR x > 15",
-                true,
-                vec![("x", ScalarValue::Int64(Some(2)))],
-            ),
-            (
-                "x < 5 OR x > 15",
-                true,
-                vec![("x", ScalarValue::Int64(Some(20)))],
-            ),
-            (
-                "x < 5 OR x > 15",
-                false,
-                vec![("x", ScalarValue::Int64(Some(10)))],
-            ),
-            (
-                "x < 5 or y > 20",
-                true,
-                vec![
-                    ("x", ScalarValue::Int64(Some(2))),
-                    ("y", ScalarValue::Int64(Some(25))),
-                ],
-            ),
-            (
-                "x < 5 or y > 20",
-                false,
-                vec![
-                    ("x", ScalarValue::Int64(Some(10))),
-                    ("y", ScalarValue::Int64(Some(15))),
-                ],
-            ),
-            // Mixed data types
-            (
-                "count > 0 AND name = 'test'",
-                true,
-                vec![
-                    ("count", ScalarValue::Int64(Some(5))),
-                    ("name", ScalarValue::Utf8(Some("test".to_string()))),
-                ],
-            ),
-            (
-                "count > 0 AND name = 'test'",
-                false,
-                vec![
-                    ("count", ScalarValue::Int64(Some(0))),
-                    ("name", ScalarValue::Utf8(Some("test".to_string()))),
-                ],
-            ),
-        ]
-    }
-
-    #[test]
-    fn test_barrier_parameterized() {
-        let test_cases = get_test_cases();
-
-        for (i, (expression, expected, column_values)) in 
test_cases.iter().enumerate() {
-            // Test our barrier implementation
-            let barrier_result = test_barrier_expression(expression, 
column_values);
-            assert_eq!(
-                barrier_result, *expected,
-                "Test case {i}: barrier('{expression}') with values 
{column_values:?} expected {expected} but got {barrier_result}"
-            );
-        }
-    }
-
-    #[tokio::test]
-    async fn test_barrier_vs_datafusion_comparison() {
-        // Test cases that we can compare against DataFusion
-        let simple_cases = [
-            ("x > 10", vec![("x", ScalarValue::Int64(Some(15)))]),
-            ("x = 10", vec![("x", ScalarValue::Int64(Some(10)))]),
-            ("x < 5", vec![("x", ScalarValue::Int64(Some(3)))]),
-            (
-                "price >= 19.99",
-                vec![("price", ScalarValue::Float64(Some(25.00)))],
-            ),
-            (
-                "name = 'test'",
-                vec![("name", ScalarValue::Utf8(Some("test".to_string())))],
-            ),
-        ];
-
-        for (expression, column_values) in simple_cases {
-            let barrier_result = test_barrier_expression(expression, 
&column_values);
-            match test_datafusion_expression(expression, &column_values).await 
{
-                Ok(datafusion_result) => {
-                    assert_eq!(
-                        barrier_result, datafusion_result,
-                        "Mismatch for expression '{expression}' with values 
{column_values:?}: barrier={barrier_result}, datafusion={datafusion_result}"
-                    );
-                }
-                Err(_) => {
-                    // DataFusion failed - that's expected for some cases
-                    // Just ensure barrier didn't crash and produced a result
-                    println!(
-                        "DataFusion failed for expression '{expression}', 
barrier result: {barrier_result}"
-                    );
-                }
-            }
-        }
-    }
-
-    #[test]
-    fn test_barrier_edge_cases() {
-        // Test empty/invalid expressions
-        let barrier_result = test_barrier_expression("", &[("x", 
ScalarValue::Int64(Some(10)))]);
-        assert!(!barrier_result, "Empty expression should return false");
-
-        let barrier_result =
-            test_barrier_expression("invalid_expression", &[("x", 
ScalarValue::Int64(Some(10)))]);
-        assert!(!barrier_result, "Invalid expression should return false");
-
-        // Test missing column references
-        let barrier_result =
-            test_barrier_expression("y > 10", &[("x", 
ScalarValue::Int64(Some(15)))]);
-        assert!(
-            !barrier_result,
-            "Missing column reference should return false"
-        );
-
-        // Test null values
-        let barrier_result = test_barrier_expression("x > 10", &[("x", 
ScalarValue::Int64(None))]);
-        assert!(!barrier_result, "Null values should return false");
-    }
-
-    /// Helper function to test barrier expression evaluation
-    fn test_barrier_expression(expression: &str, column_values: &[(&str, 
ScalarValue)]) -> bool {
-        let mut args = vec![ColumnarValue::Scalar(ScalarValue::Utf8(Some(
-            expression.to_string(),
-        )))];
-
-        // Add column name and value pairs
-        for (name, value) in column_values {
-            args.push(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
-                name.to_string(),
-            ))));
-            args.push(ColumnarValue::Scalar(value.clone()));
-        }
-
-        // Create argument types for the tester
-        let mut arg_types = vec![SedonaType::Arrow(DataType::Utf8)]; // 
expression
-        for (_, value) in column_values {
-            arg_types.push(SedonaType::Arrow(DataType::Utf8)); // column name
-            arg_types.push(SedonaType::Arrow(value.data_type())); // column 
value
-        }
-
-        let tester = ScalarUdfTester::new(barrier_udf().into(), arg_types);
-        let result = tester.invoke(args).unwrap();
-
-        match result {
-            ColumnarValue::Scalar(ScalarValue::Boolean(Some(b))) => b,
-            ColumnarValue::Scalar(ScalarValue::Boolean(None)) => false,
-            _ => panic!("Expected boolean result, got {result:?}"),
-        }
-    }
-
-    /// Helper function to test DataFusion expression evaluation (for 
comparison)
-    async fn test_datafusion_expression(
-        expression: &str,
-        column_values: &[(&str, ScalarValue)],
-    ) -> Result<bool> {
-        // Create schema from column values
-        let mut fields = vec![];
-        let mut arrays = vec![];
-
-        for (name, value) in column_values {
-            fields.push(Field::new(*name, value.data_type(), true));
-            arrays.push(value.to_array_of_size(1)?);
-        }
-
-        let schema = Arc::new(Schema::new(fields));
-        let batch = RecordBatch::try_new(schema.clone(), arrays)?;
-
-        // Create DataFusion context and register the batch as a table
-        let ctx = SessionContext::new();
-        ctx.register_batch("test_table", batch)?;
-
-        // Execute the expression as a query
-        let sql = format!("SELECT {expression} FROM test_table");
-        let df = ctx.sql(&sql).await?;
-        let results = df.collect().await?;
-
-        if results.is_empty() || results[0].num_rows() == 0 {
-            return Ok(false);
-        }
-
-        let result_array = results[0].column(0);
-        match ScalarValue::try_from_array(result_array, 0)? {
-            ScalarValue::Boolean(Some(b)) => Ok(b),
-            ScalarValue::Boolean(None) => Ok(false),
-            _ => exec_err!("Expected boolean result from DataFusion"),
-        }
-    }
-}
diff --git a/rust/sedona-functions/src/lib.rs b/rust/sedona-functions/src/lib.rs
index 1640e7a9..57acef7d 100644
--- a/rust/sedona-functions/src/lib.rs
+++ b/rust/sedona-functions/src/lib.rs
@@ -14,7 +14,6 @@
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.
-mod barrier;
 mod distance;
 pub mod executor;
 mod overlay;
diff --git a/rust/sedona-functions/src/register.rs 
b/rust/sedona-functions/src/register.rs
index 82645401..e53b5c07 100644
--- a/rust/sedona-functions/src/register.rs
+++ b/rust/sedona-functions/src/register.rs
@@ -38,7 +38,6 @@ pub fn default_function_set() -> FunctionSet {
 
     register_scalar_udfs!(
         function_set,
-        crate::barrier::barrier_udf,
         crate::distance::st_distance_sphere_udf,
         crate::distance::st_distance_spheroid_udf,
         crate::distance::st_distance_udf,

(sedona-db) branch main updated: docs: replace barrier() with KNN join behavior documentation (#635)

Reply via email to