baiguoname commented on issue #18381: URL: https://github.com/apache/datafusion/issues/18381#issuecomment-3507291146
> [@baiguoname](https://github.com/baiguoname) do let us know if the above suggestions work so that we can bring the issue to meaningful closure I thought that the previous suggestions would not work. Here is a more detailed scenario I'm describing: I have two `RecordBatch`es: `RecordBatch1` : ``` code value "A" 0.1 "A" 0.2 "C" 0.3 "B" 0.4 ``` `RecordBatch1`: ``` code value "A" 0.5 "D" 0.6 ``` And I have a `DataFrame` named `df` that receives the following `RecordBatch`es stream: `Poll::Ready(Some(RecordBatch1))` -> `Poll::Ready(None)` -> `Poll::Ready(Some(RecordBatch2))` -> `Poll::Ready(None)` Suppose there is a method called `collect_but_not_consume` on `df` that `collect` the `df` without consume the it. When I call this method: ```rust let df = df .aggregate( vec![col("code")], min(vec![col"value"]).alias("mean"), )? .sort(vec![col("code").sort(true, true))?; let stream = df.collect_but_not_consume() ``` The output from the stream would be: 1. For the `Poll::Ready(Some(RecordBatch1))` , since the method behaves like `collect`, there will be no output. 2. For the first `Poll::Ready(None)`, the stream will `collect` as normal but not consume the `stream`, so the `stream` continues to receive `RecordBatch`es from its children. For the operator `aggregate`, the `min` accumulator maintain its state for future reuse. For the operator `sort`, as a `pipline breaker`, it won't retain history data but will only sort on `RecordBatch1`. The ouput: ``` code value "A" 0.1 "B" 0.4 "C" 0.3 ``` 3. For the `Poll::Ready(Some(RecordBatch1))`, no output. 4. For the second `Poll::Ready(None)` The ouput: ``` code value "A" 0.1 "D" 0.6 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
