jorgecarleitao opened a new pull request #8225:
URL: https://github.com/apache/arrow/pull/8225


   This is a proposal to change how we programmatically iterate over record 
batches in arrow and datafusion.
   
   Instead of 
   
   ```
   pub fn collect(
       it: Arc<Mutex<dyn RecordBatchReader + Send + Sync>>,
   ) -> Result<Vec<RecordBatch>> {
       let mut reader = it.lock().unwrap();
       let mut results: Vec<RecordBatch> = vec![];
       loop {
           match reader.next_batch() {
               Ok(Some(batch)) => {
                   results.push(batch);
               }
               Ok(None) => {
                   // end of result set
                   return Ok(results);
               }
               Err(e) => return Err(ExecutionError::from(e)),
           }
       }
   }
   ```
   
   use 
   
   ```
   /// Create a vector of record batches from an iterator
   pub fn collect(
       it: Arc<Mutex<dyn RecordBatchReader + Send + Sync>>,
   ) -> Result<Vec<RecordBatch>> {
       it.lock()
           .unwrap()
           .into_iter()
           .collect::<ArrowResult<Vec<_>>>()
           .map_err(|e| ExecutionError::from(e))
   }
   ```
   
   I.e. via the iterator API.
   
   This allow us to write more expressive code, as well as offer a well 
documented and popular API to our users (Iterator).
   
   Finally, this change also opens the possibility to implement 
`future::Stream`, which is the async version of `Iterator`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to