0lai0 opened a new pull request, #1393:
URL: https://github.com/apache/mahout/pull/1393

   ### Related Issues
   
   Closes #1340 
   Part of #1338
   
   ### Changes
   
   - [ ] Bug fix
   - [x] New feature
   - [ ] Refactoring
   - [ ] Documentation
   - [ ] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Why
   
   `ParquetReader::new` previously rejected any non-`Float64` column at schema 
validation, making f32 Parquet files unusable. This blocked f32 file-source 
pipelines that need to keep data as `Vec<f32>` end-to-end.
   
   Note: higher-level helpers such as `read_parquet_batch()` still return 
`Vec<f64>` today. Wiring f32 through the full pipeline is tracked separately 
under #1338.
   
   ### How
   
   `src/reader.rs`
   - Add `Default` supertrait bound to `FloatElem` (`f32` / `f64` already 
satisfy it)
   - Add `handle_float32_nulls`: mirrors `handle_float64_nulls`, with a 
zero-copy fast path when no nulls are present
   
   `src/readers/parquet.rs`
   - Introduce internal `pub(crate)` trait `ArrowPrimitive` — provides 
`extend_from_arrow_array` / `collect_from_arrow_array` per concrete type:
     - **same dtype** → `extend_from_slice` directly from the Arrow buffer 
(zero-copy)
     - **cross dtype** → `arrow::compute::cast`, then `extend_from_slice`
       - `f32 → f64`: exact, NaN preserved
       - `f64 → f32`: overflow becomes `±Inf`, NaN preserved
   - `ParquetReader<T: FloatElem = f64>` and `ParquetStreamingReader<T: 
FloatElem = f64>`
   - Schema validation now accepts:
     - `List<Float32>`
     - `List<Float64>`
     - `FixedSizeList<Float32>`
     - `FixedSizeList<Float64>`
   - Default type parameter `T = f64` keeps most existing call sites unchanged 
(type inference from `Vec<f64>` return types still works)
   
   `src/lib.rs`
   - Re-export `handle_float32_nulls` for symmetry with `handle_float64_nulls`
   
   `src/remote.rs`
   - Add explicit `ParquetReader::<f64>::new(...)` in a test helper after 
genericisation (compiler could not infer `T` from `data.len()` alone)
   
   ## Checklist
   
   - [x] Added or updated unit tests for all changes
   - [x] Added or updated documentation for all changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to