crm26 opened a new pull request, #21371:
URL: https://github.com/apache/datafusion/pull/21371

   ## Summary
   
   Adds vector distance and array math functions to 
`datafusion-functions-nested`, enabling vector search and array algebra in 
standard SQL.
   
   ```sql
   -- Vector search: find nearest neighbors by cosine distance
   SELECT id, cosine_distance(embedding, ARRAY[0.1, 0.2, ...]) as dist
   FROM documents ORDER BY dist LIMIT 10
   
   -- Array math
   SELECT array_normalize(embedding) FROM documents
   SELECT array_add(vec_a, vec_b) FROM t
   SELECT array_scale(embedding, 2.0) FROM documents
   ```
   
   ## Functions
   
   | Function | Returns | Description |
   |----------|---------|-------------|
   | `cosine_distance(a, b)` | float64 | 1 - cosine similarity |
   | `inner_product(a, b)` | float64 | Dot product |
   | `array_normalize(a)` | list(float64) | Unit vector |
   | `array_add(a, b)` | list(float64) | Element-wise addition |
   | `array_subtract(a, b)` | list(float64) | Element-wise subtraction |
   | `array_scale(a, f)` | list(float64) | Scalar multiplication |
   
   All have `list_*` aliases. `inner_product` also aliased as `dot_product`.
   
   ## Design
   
   Shared primitives in `vector_math.rs`:
   - `dot_product_f64(a, b)` — used by `inner_product` and `cosine_distance`
   - `magnitude_f64(a)` — used by `cosine_distance` and `array_normalize`
   - `sum_of_squares_f64(a)` — used by `magnitude_f64`
   - `convert_to_f64_array(a)` — shared with existing `array_distance`
   
   The existing `distance.rs` duplicate `convert_to_f64_array` is consolidated 
into the shared module.
   
   Follows the exact pattern of the existing `array_distance` function: same 
signature style, `coerce_types`, null handling, and type support (Float32, 
Float64, Int32, Int64, FixedSizeList, LargeList, List).
   
   ## Tests
   
   79 tests including: normal inputs, null handling, zero vectors, orthogonal 
vectors, empty arrays, Float32/Float64, mismatched lengths, vector search 
ranking pattern. Sqllogictest coverage in `vector_functions.slt`. Clippy clean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to