devinjdangelo opened a new issue, #8683:
URL: https://github.com/apache/arrow-datafusion/issues/8683

   ### Describe the bug
   
   The limit parameter passed to TableProvider::Scan is always None when any 
filters are present.
   
   ### To Reproduce
   
   I added the following print statements to the ListingTable's 
TableProvider::Scan method:
   
   ```rust
   println!("ListingTable::Scan got {projection:?} as projection parameter");
   println!("ListingTable::Scan got {filters:?} as filter parameter");
   println!("ListingTable::Scan got {limit:?} as limit parameter");
   ```
   
   Then, I ran the following example:
   
   ```rust
   use datafusion::error::Result;
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = SessionContext::new();
   
       ctx.register_parquet(
           "test",
           "benchmarks/data/tpch_sf1/lineitem/part-0.parquet",
           ParquetReadOptions::default(),
       )
       .await?;
   
       let df = ctx
           .sql(
               "SELECT l_linenumber, l_tax
           FROM test \
           limit 2",
           )
           .await?;
       println!("Query 1 count: {}",df.count().await?);
   
       let df = ctx
           .sql(
               "SELECT l_linenumber, l_tax
           FROM test \
           where l_tax>0.03
           limit 2",
           )
           .await?;
       println!("Query 2 count: {}",df.count().await?);
   
       let df = ctx
           .sql(
               "SELECT l_linenumber, l_tax
           FROM test \
           where l_returnflag='N'
           limit 2",
           )
           .await?;
       println!("Query 3 count: {}",df.count().await?);
   
       let df = ctx
           .sql(
               "SELECT l_linenumber, l_tax
           FROM test \
           limit 2",
           )
           .await?;
       println!("Query 4 count: {}",df.count().await?);
   
       Ok(())
   }
   ```
   
   which output:
   
   ```bash
   ListingTable::Scan got Some([3, 7]) as projection parameter
   ListingTable::Scan got [] as filter parameter
   ListingTable::Scan got Some(2) as limit parameter
   +--------------+-------+
   | l_linenumber | l_tax |
   +--------------+-------+
   | 1            | 0.02  |
   | 2            | 0.06  |
   +--------------+-------+
   ListingTable::Scan got Some([3, 7]) as projection parameter
   ListingTable::Scan got [BinaryExpr(BinaryExpr { left: Column(Column { 
relation: None, name: "l_tax" }), op: Gt, right: 
Literal(Decimal128(Some(3),15,2)) })] as filter parameter
   ListingTable::Scan got None as limit parameter
   +--------------+-------+
   | l_linenumber | l_tax |
   +--------------+-------+
   | 4            | 0.04  |
   | 1            | 0.06  |
   +--------------+-------+
   ListingTable::Scan got Some([3, 7, 8]) as projection parameter
   ListingTable::Scan got [BinaryExpr(BinaryExpr { left: Column(Column { 
relation: None, name: "l_returnflag" }), op: Eq, right: Literal(Utf8("N")) })] 
as filter parameter
   ListingTable::Scan got None as limit parameter
   +--------------+-------+
   | l_linenumber | l_tax |
   +--------------+-------+
   | 2            | 0.06  |
   | 3            | 0.03  |
   +--------------+-------+
   ListingTable::Scan got Some([3, 7]) as projection parameter
   ListingTable::Scan got [] as filter parameter
   ListingTable::Scan got Some(2) as limit parameter
   +--------------+-------+
   | l_linenumber | l_tax |
   +--------------+-------+
   | 1            | 0.02  |
   | 2            | 0.06  |
   +--------------+-------+
   ```
   
   ### Expected behavior
   
   The limit of 10 should be pushed down in all four queries, but is only found 
in two.
   
   ### Additional context
   
   I looked at the optimizer code responsible for pushing down the limit, but 
did not notice any obvious cases that would explain this. 
https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/push_down_limit.rs#L115


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to