devinjdangelo opened a new issue, #8683:
URL: https://github.com/apache/arrow-datafusion/issues/8683
### Describe the bug
The limit parameter passed to TableProvider::Scan is always None when any
filters are present.
### To Reproduce
I added the following print statements to the ListingTable's
TableProvider::Scan method:
```rust
println!("ListingTable::Scan got {projection:?} as projection parameter");
println!("ListingTable::Scan got {filters:?} as filter parameter");
println!("ListingTable::Scan got {limit:?} as limit parameter");
```
Then, I ran the following example:
```rust
use datafusion::error::Result;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
ctx.register_parquet(
"test",
"benchmarks/data/tpch_sf1/lineitem/part-0.parquet",
ParquetReadOptions::default(),
)
.await?;
let df = ctx
.sql(
"SELECT l_linenumber, l_tax
FROM test \
limit 2",
)
.await?;
println!("Query 1 count: {}",df.count().await?);
let df = ctx
.sql(
"SELECT l_linenumber, l_tax
FROM test \
where l_tax>0.03
limit 2",
)
.await?;
println!("Query 2 count: {}",df.count().await?);
let df = ctx
.sql(
"SELECT l_linenumber, l_tax
FROM test \
where l_returnflag='N'
limit 2",
)
.await?;
println!("Query 3 count: {}",df.count().await?);
let df = ctx
.sql(
"SELECT l_linenumber, l_tax
FROM test \
limit 2",
)
.await?;
println!("Query 4 count: {}",df.count().await?);
Ok(())
}
```
which output:
```bash
ListingTable::Scan got Some([3, 7]) as projection parameter
ListingTable::Scan got [] as filter parameter
ListingTable::Scan got Some(2) as limit parameter
+--------------+-------+
| l_linenumber | l_tax |
+--------------+-------+
| 1 | 0.02 |
| 2 | 0.06 |
+--------------+-------+
ListingTable::Scan got Some([3, 7]) as projection parameter
ListingTable::Scan got [BinaryExpr(BinaryExpr { left: Column(Column {
relation: None, name: "l_tax" }), op: Gt, right:
Literal(Decimal128(Some(3),15,2)) })] as filter parameter
ListingTable::Scan got None as limit parameter
+--------------+-------+
| l_linenumber | l_tax |
+--------------+-------+
| 4 | 0.04 |
| 1 | 0.06 |
+--------------+-------+
ListingTable::Scan got Some([3, 7, 8]) as projection parameter
ListingTable::Scan got [BinaryExpr(BinaryExpr { left: Column(Column {
relation: None, name: "l_returnflag" }), op: Eq, right: Literal(Utf8("N")) })]
as filter parameter
ListingTable::Scan got None as limit parameter
+--------------+-------+
| l_linenumber | l_tax |
+--------------+-------+
| 2 | 0.06 |
| 3 | 0.03 |
+--------------+-------+
ListingTable::Scan got Some([3, 7]) as projection parameter
ListingTable::Scan got [] as filter parameter
ListingTable::Scan got Some(2) as limit parameter
+--------------+-------+
| l_linenumber | l_tax |
+--------------+-------+
| 1 | 0.02 |
| 2 | 0.06 |
+--------------+-------+
```
### Expected behavior
The limit of 10 should be pushed down in all four queries, but is only found
in two.
### Additional context
I looked at the optimizer code responsible for pushing down the limit, but
did not notice any obvious cases that would explain this.
https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/push_down_limit.rs#L115
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]