austin362667 opened a new pull request, #12476:
URL: https://github.com/apache/datafusion/pull/12476
## Which issue does this PR close?
Closes #12475.
## Rationale for this change
Add **dot product** functionality to DataFusion. It would be valuable to add
scalar UDF `array_dot_product` / `list_dot_product` which computes inner
product of two arrays, that is already supported by well-known DBs like DuckDB.
## What changes are included in this PR?
* Re-organize `convert_to_f64_array` to `functions-nested/utils.rs`
## Are these changes tested?
Yes, added some array-specific SQL logic test, including
`List`/`LargeList`/`FixSizedList`
## Are there any user-facing changes?
Yes, new function `array_dot_product(arr1, arr2)` is added.
For instance,
```
> CREATE TABLE word_embedding (
emb_a DOUBLE[],
emb_b DOUBLE[]
);
0 row(s) fetched.
Elapsed 0.008 seconds.
> INSERT INTO word_embedding VALUES
([1.0, 2.0, 3.0], [1.0, 2.0, 5.0]),
([2.0, 4.0, 6.0], [2.0, 4.0, 6.0]),
([1.5, 2.5, 3.5], [4.5, 6.5, 8.5]);
+-------+
| count |
+-------+
| 3 |
+-------+
1 row(s) fetched.
Elapsed 0.009 seconds.
> SELECT
emb_a,
emb_b,
list_dot_product(emb_a, emb_b) AS inner_product
FROM
word_embedding;
+-----------------+-----------------+---------------+
| emb_a | emb_b | inner_product |
+-----------------+-----------------+---------------+
| [1.0, 2.0, 3.0] | [1.0, 2.0, 5.0] | 20.0 |
| [2.0, 4.0, 6.0] | [2.0, 4.0, 6.0] | 56.0 |
| [1.5, 2.5, 3.5] | [4.5, 6.5, 8.5] | 52.75 |
+-----------------+-----------------+---------------+
3 row(s) fetched.
Elapsed 0.008 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]