rich7420 commented on code in PR #704:
URL: https://github.com/apache/mahout/pull/704#discussion_r2605556252


##########
qdp/qdp-core/src/io.rs:
##########
@@ -365,3 +308,123 @@ pub fn read_parquet_batch<P: AsRef<Path>>(path: P) -> 
Result<(Vec<f64>, usize, u
 
     Ok((all_data, num_samples, sample_size))
 }
+
+/// Reads batch data from an Arrow IPC file.
+///
+/// Supports `FixedSizeList<Float64>` and `List<Float64>` column formats.
+/// Returns flattened data suitable for batch encoding.
+///
+/// # Returns
+/// Tuple of `(flattened_data, num_samples, sample_size)`
+pub fn read_arrow_ipc_batch<P: AsRef<Path>>(path: P) -> Result<(Vec<f64>, 
usize, usize)> {
+    let file = File::open(path.as_ref()).map_err(|e| {
+        MahoutError::Io(format!("Failed to open Arrow IPC file: {}", e))
+    })?;
+
+    let reader = ArrowFileReader::try_new(file, None).map_err(|e| {
+        MahoutError::Io(format!("Failed to create Arrow IPC reader: {}", e))
+    })?;
+
+    let mut all_data = Vec::new();

Review Comment:
   We should prevent this action from OOM in large files. we could solve in 
other PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to