alamb commented on a change in pull request #9926:
URL: https://github.com/apache/arrow/pull/9926#discussion_r608675726
##########
File path: rust/datafusion/src/physical_plan/limit.rs
##########
@@ -200,30 +200,39 @@ pub fn truncate_batch(batch: &RecordBatch, n: usize) ->
RecordBatch {
/// A Limit stream limits the stream to up to `limit` rows.
struct LimitStream {
+ /// The maximum number of rows to produce
limit: usize,
- input: SendableRecordBatchStream,
- // the current count
+ /// The input to read from. This is set to None once the limit is
+ /// reached to enable early termination
+ input: Option<SendableRecordBatchStream>,
+ /// Copy of the input schema
+ schema: SchemaRef,
+ // the current number of rows which have been produced
current_len: usize,
}
impl LimitStream {
fn new(input: SendableRecordBatchStream, limit: usize) -> Self {
+ let schema = input.schema();
Review comment:
This is a good point.
One benefit of the this PR over `fuse()` is that this PR will actually drop
the input stream (freeing resources) in addition to not calling the input
stream again:
https://docs.rs/futures-util/0.3.13/src/futures_util/stream/stream/fuse.rs.html#10
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]