westonpace opened a new issue, #7228:
URL: https://github.com/apache/arrow-rs/issues/7228

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Currently we use Arrow as part of our public API in Lance.  
RecordBatchReader is extremely useful.  However, there are times we would like 
an asynchronous version.  There is datafusion's RecordBatchStream and we have 
our own equivalent in lancedb (also called RecordBatchStream for better or 
worse).  The reason we have our own is that we don't want to make datafusion a 
part of the public API just to keep the API simpler.  Transferring between the 
various endpoints we have a lot of conversion from arrow's error to 
datafusion's error to lancedb's error.
   
   I'm mainly opening this issue in the interest of discussion, to see if this 
is something we'd be willing to add.  If so, I can put together a proposal PR.
   
   **Describe the solution you'd like**
   
   ```
   // Pretty much identical to datafusion's `RecordBatchStream` except using 
arrow's `Result`
   pub trait RecordBatchStream: Stream<Item = Result<RecordBatch>> {
       fn schema(&self) -> Arc<Schema>;
   }
   ```
   
   **Describe alternatives you've considered**
   
   As far as I can tell the biggest drawback would be the introduction of 
`futures` as a dependency.  This could be feature-gated.
   
   Alternatively, we could vendor the `Stream` trait:
   
   ```
   pub trait RecordBatchStream: Stream<Item = Result<RecordBatch>> {
       fn schema(&self) -> Arc<Schema>;
       fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> 
Poll<Option<Result<RecordBatch>>>
   }
   ```
   
   I'm not sure how I feel about that but I don't think `futures` is going to 
be absorbed into `std` anytime soon.  We could even still have a `futures` 
trait that provides an `impl futures::Stream for RecordBatchStream`.
   
   **Additional context**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to