tustvold commented on code in PR #6907:
URL: https://github.com/apache/arrow-rs/pull/6907#discussion_r1894344815
##########
parquet/src/arrow/async_reader/mod.rs:
##########
@@ -654,6 +657,66 @@ impl<T> ParquetRecordBatchStream<T> {
}
}
+impl<T> ParquetRecordBatchStream<T>
+where
+ T: AsyncFileReader + Unpin + Send + 'static,
+{
+ /// Fetches the next row group from the stream.
+ ///
+ /// Users can continue to call this function to get row groups and decode
them concurrently.
+ ///
+ /// ## Notes
+ ///
+ /// ParquetRecordBatchStream should be used either as a `Stream` or with
`next_row_group`; they should not be used simultaneously.
+ ///
+ /// ## Returns
+ ///
+ /// - `Ok(None)` if the stream has ended.
+ /// - `Err(error)` if the stream has errored. All subsequent calls will
return `Ok(None)`.
+ /// - `Ok(Some(reader))` which holds all the data for the row group.
+ pub async fn next_row_group(&mut self) ->
Result<Option<ParquetRecordBatchReader>> {
+ loop {
+ match &mut self.state {
+ StreamState::Decoding(_) | StreamState::Reading(_) =>
unreachable!(),
Review Comment:
I think this should probably return an error saying not to mix polling the
stream and using this API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]