crm26 commented on PR #21542:
URL: https://github.com/apache/datafusion/pull/21542#issuecomment-4274095092
Thanks for the detailed review, @Jefffrey. Rework pushed in fc3ee90a8.
Walking through each comment:
**1. Iterate via offsets/values, not per-row ArrayRef downcast**
(`cosine_distance.rs:157`)
Rewrote `general_cosine_distance` to downcast `list_array.values()` once to
`&Float64Array`, then slice by `value_offsets()` per row. The inner `for i in
0..len` loop reads directly from the contiguous `ScalarBuffer<f64>` — no
per-row downcast, no `Option<f64>` unwrapping.
**2. Nested-list unwrap loop** (`cosine_distance.rs:178`)
Removed entirely. The function is not intended to support nested lists, and
`coerce_types` rejects anything other than `List`/`LargeList`/`FixedSizeList`
of a numeric inner type.
**3. Redundant null/Float64 check** (`cosine_distance.rs:217`)
Agreed — `coerce_types` calls `coerced_type_with_base_type_only` with
`Float64` as the base type, which guarantees the inner type is `Float64` by the
time we hit `invoke_with_args`. Dropped `convert_to_f64_array` entirely and
replaced with a direct `as_float64_array` downcast.
**4. Ambiguous length-mismatch error wording** (`cosine_distance.rs:220`)
Updated to `"cosine_distance requires both list inputs to have the same
length, got {len1} and {len2}"`. Now explicit that the lengths are the list
elements' lengths, not the outer array, and includes the observed values.
**5. Duplicate Rust unit tests** (`cosine_distance.rs:235`)
Removed the `mod tests` block. SLT coverage includes orthogonal, identical,
opposite, 45-degree, zero-magnitude, mismatched lengths, LargeList, integer
coercion, multi-row, alias, empty arrays, no-args, and return-type checks.
Added one new SLT case for NULL-element-in-list to preserve that particular
behavior the Rust tests were covering.
**6. `pub mod vector_math`** (`lib.rs:72`)
Moot after #7 — the whole file is deleted, so the declaration is gone.
**7. Inline the math instead of a separate module** (`vector_math.rs:67`)
Agreed — with only one caller, the indirection wasn't paying for itself.
Deleted `vector_math.rs` and inlined dot/magnitude into the tight per-row loop
in `general_cosine_distance`.
Full validation matrix (`fmt --all`, workspace `clippy -D warnings`, full +
sqlite-extended SLT, CLI, doctests, feature-flag spot-checks, `extended_tests`
workspace build, rustdoc, license, typos, machete, generated-doc regen) passed
locally before push. Let me know if anything else needs tightening.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]