jwimberl opened a new issue, #1551:
URL: https://github.com/apache/datafusion-python/issues/1551

   **Describe the bug**
   
   A custom table provider for a ParquetSource with a trivial catalog and an 
Int64 column yields some errors when a SQL query has a filter with a literal 
limit on that column of the form
   
   ```
   assertion `left == right` failed: Simplified expression should have the same 
data type as the original
     left: Null
    right: Int64
   ```
   
   The error does not occur when using datafusion-python 52; it also does not 
occur when running the query purely in a Rust SessionContext; the backtrace for 
the above error shows it coming from datafusion-ffi code as well.
   
   This may of course not be a bug, but instead some bad practice that the 
version 52 set of crates tolerates but which is now invalid. A MRE of this 
custom table provider can be found in the public repo 
https://github.com/jwimberl/datafusion_python_53_int64filter_repro, which 
contains
   
   - a non-working datafusion53 version (in branch 
[main](https://github.com/jwimberl/datafusion_python_53_int64filter_repro)) 
   - a baseline working version (in branch 
[datafusion52](https://github.com/jwimberl/datafusion_python_53_int64filter_repro/tree/datafusion52))
   
   and a canned dummy dataset. The README.md of this repo has more details.
   
   **To Reproduce**
   
   In the `main` branch, build the `py_repro_provider` crate with `maturin 
develop` and run `python repro.py`. This loads the dummy dataset as a table 
`dummy_table` and runs two queries
   - `SELECT * FROM dummy_table LIMIT 1`, which is successful
   - `SELECT * FROM dummy_table LIMIT 5`, which panics
   
   Itss output should be something like
   
   ```
   Successful query:
      a   b
   0  0  42
   Unsuccesful query:
   
   thread '<unnamed>' panicked at 
/home/jwimberley/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-53.1.0/src/simplifier/mod.rs:76:17:
   assertion `left == right` failed: Simplified expression should have the same 
data type as the original
     left: Null
    right: Int64
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   followed by backtrace information.
   
   **Expected behavior**
   
   In the `datafusion52` branch, build the`py_repro_provider` with `maturin 
develop` and run `python repro.py`. It runs the same two queries, and its 
output should be
   
   ```
   Successful query:
      a   b
   0  0  42
   Also successful query:
      a   b
   0  0  42
   1  1  42
   2  2  42
   3  3  42
   4  4  42
   ```
   
   **Additional context**
   
   In either the `main` branch or `datafusion52` branch, the Rust code for the 
table provider is in the directory `repro_provider`, and there is a 
corresponding cargo test that runs `SELECT * FROM dummy_table WHERE a < 5`. 
Without the FFI layer, this is successful with both datafusion 52 and 53.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to