Blajda opened a new issue, #7876:
URL: https://github.com/apache/arrow-datafusion/issues/7876
### Describe the bug
I have table `t1` with a column called `file_path`
I want to obtain a list of file_paths where each element is unique and then
take a random subset of those columns.
I thought that this could be achieved with the following code.
```rust
let files = ctx.sql("select file_path from t1 group by
file_path").await.unwrap()
.with_column("r", random() ).unwrap()
.filter(col("r").lt_eq(lit(0.2))).unwrap();
files.show().await.unwrap();
```
However in the output of my query I see the following entries which contains
a record that should be filtered out.
```
| A | 0.8023022275259943 |
| B | 0.05829777789599211 |
| C | 0.14330028518553894 |
```
This is the calculated logical plan
```
Projection: t1.file_path, random() AS r
Aggregate: groupBy=[[t1.file_path]], aggr=[[]]
Filter: random() <= Float64(0.2)
TableScan: t1 projection=[file_path]
```
In this case I would expect the filter to occur after the aggregate
operation not before.
### To Reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]