jwimberl opened a new issue, #1551:
URL: https://github.com/apache/datafusion-python/issues/1551
**Describe the bug**
A custom table provider for a ParquetSource with a trivial catalog and an
Int64 column yields some errors when a SQL query has a filter with a literal
limit on that column of the form
```
assertion `left == right` failed: Simplified expression should have the same
data type as the original
left: Null
right: Int64
```
The error does not occur when using datafusion-python 52; it also does not
occur when running the query purely in a Rust SessionContext; the backtrace for
the above error shows it coming from datafusion-ffi code as well.
This may of course not be a bug, but instead some bad practice that the
version 52 set of crates tolerates but which is now invalid. A MRE of this
custom table provider can be found in the public repo
https://github.com/jwimberl/datafusion_python_53_int64filter_repro, which
contains
- a non-working datafusion53 version (in branch
[main](https://github.com/jwimberl/datafusion_python_53_int64filter_repro))
- a baseline working version (in branch
[datafusion52](https://github.com/jwimberl/datafusion_python_53_int64filter_repro/tree/datafusion52))
and a canned dummy dataset. The README.md of this repo has more details.
**To Reproduce**
In the `main` branch, build the `py_repro_provider` crate with `maturin
develop` and run `python repro.py`. This loads the dummy dataset as a table
`dummy_table` and runs two queries
- `SELECT * FROM dummy_table LIMIT 1`, which is successful
- `SELECT * FROM dummy_table LIMIT 5`, which panics
Itss output should be something like
```
Successful query:
a b
0 0 42
Unsuccesful query:
thread '<unnamed>' panicked at
/home/jwimberley/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-53.1.0/src/simplifier/mod.rs:76:17:
assertion `left == right` failed: Simplified expression should have the same
data type as the original
left: Null
right: Int64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
followed by backtrace information.
**Expected behavior**
In the `datafusion52` branch, build the`py_repro_provider` with `maturin
develop` and run `python repro.py`. It runs the same two queries, and its
output should be
```
Successful query:
a b
0 0 42
Also successful query:
a b
0 0 42
1 1 42
2 2 42
3 3 42
4 4 42
```
**Additional context**
In either the `main` branch or `datafusion52` branch, the Rust code for the
table provider is in the directory `repro_provider`, and there is a
corresponding cargo test that runs `SELECT * FROM dummy_table WHERE a < 5`.
Without the FFI layer, this is successful with both datafusion 52 and 53.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]