ryankert01 commented on issue #752:
URL: https://github.com/apache/mahout/issues/752#issuecomment-3702502564

   # Adding New Input Format Support
   
   This document explains how to add support for new input formats to the QDP 
library using the refactored reader architecture.
   
   ## Overview
   
   The QDP library uses a trait-based architecture for reading quantum data 
from various sources. This makes it easy to add new input formats without 
modifying the core library code.
   
   ## Architecture
   
   The reader system is based on two main traits:
   
   - **`DataReader`**: Basic interface for batch reading (read all data at once)
   - **`StreamingDataReader`**: Extended interface for chunk-by-chunk streaming 
(for large files)
   
   ## Adding a New Format
   
   ### Step 1: Implement the `DataReader` Trait
   
   Create a new file in `qdp-core/src/readers/` for your format. For example, 
to add NumPy support:
   
   ```rust
   // qdp-core/src/readers/numpy.rs
   
   use std::path::Path;
   use crate::error::{MahoutError, Result};
   use crate::reader::DataReader;
   
   pub struct NumpyReader {
       path: std::path::PathBuf,
       read: bool,
   }
   
   impl NumpyReader {
       pub fn new<P: AsRef<Path>>(path: P) -> Result<Self> {
           Ok(Self {
               path: path.as_ref().to_path_buf(),
               read: false,
           })
       }
   }
   
   impl DataReader for NumpyReader {
       fn read_batch(&mut self) -> Result<(Vec<f64>, usize, usize)> {
           if self.read {
               return Err(MahoutError::InvalidInput("Reader already 
consumed".to_string()));
           }
           self.read = true;
   
           // TODO: Implement NumPy file reading logic
           // 1. Open and parse .npy file
           // 2. Extract shape information (num_samples, sample_size)
           // 3. Read data as Vec<f64>
           // 4. Return (flattened_data, num_samples, sample_size)
           
           unimplemented!("NumPy reading not yet implemented")
       }
   
       fn get_sample_size(&self) -> Option<usize> {
           // Return sample size if known before reading
           None
       }
   
       fn get_num_samples(&self) -> Option<usize> {
           // Return number of samples if known before reading
           None
       }
   }
   ```
   
   ### Step 2: (Optional) Implement `StreamingDataReader` for Large Files
   
   If your format needs to support streaming for large files:
   
   ```rust
   use crate::reader::StreamingDataReader;
   
   impl StreamingDataReader for NumpyReader {
       fn read_chunk(&mut self, buffer: &mut [f64]) -> Result<usize> {
           // Implement chunk-by-chunk reading
           // Return number of elements written to buffer
           // Return 0 when no more data
           unimplemented!("Streaming not yet implemented")
       }
   
       fn total_rows(&self) -> usize {
           // Return total number of samples
           0
       }
   }
   ```
   
   ### Step 3: Register Your Reader
   
   Add your reader to `qdp-core/src/readers/mod.rs`:
   
   ```rust
   pub mod parquet;
   pub mod arrow_ipc;
   pub mod numpy;  // Add this line
   
   pub use parquet::{ParquetReader, ParquetStreamingReader};
   pub use arrow_ipc::ArrowIPCReader;
   pub use numpy::NumpyReader;  // Add this line
   ```
   
   ### Step 4: Add Dependencies (if needed)
   
   If your format requires external crates, add them to `qdp-core/Cargo.toml`:
   
   ```toml
   [dependencies]
   # ... existing dependencies ...
   ndarray-npy = "0.8"  # Example for NumPy support
   ```
   
   ### Step 5: Add Tests
   
   Create tests for your new reader in `qdp-core/tests/numpy_io.rs`:
   
   ```rust
   use qdp_core::reader::DataReader;
   use qdp_core::readers::NumpyReader;
   
   #[test]
   fn test_read_numpy_batch() {
       // Create test .npy file
       // ...
       
       let mut reader = NumpyReader::new("test.npy").unwrap();
       let (data, num_samples, sample_size) = reader.read_batch().unwrap();
       
       assert_eq!(num_samples, 10);
       assert_eq!(sample_size, 16);
       assert_eq!(data.len(), num_samples * sample_size);
   }
   ```
   
   ### Step 6: Add Convenience Functions (Optional)
   
   You can add convenience functions to `qdp-core/src/io.rs` for backward 
compatibility or ease of use:
   
   ```rust
   pub fn read_numpy_batch<P: AsRef<Path>>(path: P) -> Result<(Vec<f64>, usize, 
usize)> {
       use crate::reader::DataReader;
       let mut reader = crate::readers::NumpyReader::new(path)?;
       reader.read_batch()
   }
   ```
   
   ### Step 7: Add Integration with QdpEngine (Optional)
   
   Add a high-level API method to `QdpEngine` in `qdp-core/src/lib.rs`:
   
   ```rust
   impl QdpEngine {
       // ... existing methods ...
       
       pub fn encode_from_numpy(
           &self,
           path: &str,
           num_qubits: usize,
           encoding_method: &str,
       ) -> Result<*mut DLManagedTensor> {
           use crate::reader::DataReader;
           let mut reader = crate::readers::NumpyReader::new(path)?;
           let (batch_data, num_samples, sample_size) = reader.read_batch()?;
           self.encode_batch(&batch_data, num_samples, sample_size, num_qubits, 
encoding_method)
       }
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to