crm26 commented on PR #21542:
URL: https://github.com/apache/datafusion/pull/21542#issuecomment-4274095092

   Thanks for the detailed review, @Jefffrey. Rework pushed in fc3ee90a8. 
Walking through each comment:
   
   **1. Iterate via offsets/values, not per-row ArrayRef downcast** 
(`cosine_distance.rs:157`)
   Rewrote `general_cosine_distance` to downcast `list_array.values()` once to 
`&Float64Array`, then slice by `value_offsets()` per row. The inner `for i in 
0..len` loop reads directly from the contiguous `ScalarBuffer<f64>` — no 
per-row downcast, no `Option<f64>` unwrapping.
   
   **2. Nested-list unwrap loop** (`cosine_distance.rs:178`)
   Removed entirely. The function is not intended to support nested lists, and 
`coerce_types` rejects anything other than `List`/`LargeList`/`FixedSizeList` 
of a numeric inner type.
   
   **3. Redundant null/Float64 check** (`cosine_distance.rs:217`)
   Agreed — `coerce_types` calls `coerced_type_with_base_type_only` with 
`Float64` as the base type, which guarantees the inner type is `Float64` by the 
time we hit `invoke_with_args`. Dropped `convert_to_f64_array` entirely and 
replaced with a direct `as_float64_array` downcast.
   
   **4. Ambiguous length-mismatch error wording** (`cosine_distance.rs:220`)
   Updated to `"cosine_distance requires both list inputs to have the same 
length, got {len1} and {len2}"`. Now explicit that the lengths are the list 
elements' lengths, not the outer array, and includes the observed values.
   
   **5. Duplicate Rust unit tests** (`cosine_distance.rs:235`)
   Removed the `mod tests` block. SLT coverage includes orthogonal, identical, 
opposite, 45-degree, zero-magnitude, mismatched lengths, LargeList, integer 
coercion, multi-row, alias, empty arrays, no-args, and return-type checks. 
Added one new SLT case for NULL-element-in-list to preserve that particular 
behavior the Rust tests were covering.
   
   **6. `pub mod vector_math`** (`lib.rs:72`)
   Moot after #7 — the whole file is deleted, so the declaration is gone.
   
   **7. Inline the math instead of a separate module** (`vector_math.rs:67`)
   Agreed — with only one caller, the indirection wasn't paying for itself. 
Deleted `vector_math.rs` and inlined dot/magnitude into the tight per-row loop 
in `general_cosine_distance`.
   
   Full validation matrix (`fmt --all`, workspace `clippy -D warnings`, full + 
sqlite-extended SLT, CLI, doctests, feature-flag spot-checks, `extended_tests` 
workspace build, rustdoc, license, typos, machete, generated-doc regen) passed 
locally before push. Let me know if anything else needs tightening.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to