alamb opened a new issue, #15035: URL: https://github.com/apache/datafusion/issues/15035
### Is your feature request related to a problem or challenge? - related to https://github.com/apache/datafusion/issues/14944 The usecase is a predicate like this, where month_id is an integer: ```sql month_id = '202502' ``` In DataFusion this results in casting the `month_id` column to a `Utf8` and then is compared to the string `'202502'` This is non ideal for at least 3 reasons: 1. Converting many rows to strings is expensive 2. Comparing strings is much slower than comparing integers 3. many data sources (like parquet) only handle predicates in the form of `<col> <op> <const>` and not `cast(<col>) <op> <const>` so these predicates can be pushed down Here is an example (note the predicate in the plan below is `CAST(foo.month_id AS Utf8) = Utf8("2024") ` rather than `foo.month_id = Int32("2024")`): ```sql DataFusion CLI v46.0.0 > create table foo(month_id int) as values (1), (2), (3); 0 row(s) fetched. Elapsed 0.003 seconds. > explain select * from foo where month_id = '2024'; +---------------+-------------------------------------------------------+ | plan_type | plan | +---------------+-------------------------------------------------------+ | logical_plan | Filter: CAST(foo.month_id AS Utf8) = Utf8("2024") | | | TableScan: foo projection=[month_id] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: CAST(month_id@0 AS Utf8) = 2024 | | | DataSourceExec: partitions=1, partition_sizes=[1] | | | | +---------------+-------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.008 seconds. ``` ### Describe the solution you'd like I would like the filter in the plan above to be `FilterExec: month_id = 2024` (no cast on month_id) Other notes: 1. I think this applies to `=` and `!=` 2. I don't think it can be done for `<`, `<=`, `>=` and `>` as the semantics for comparing ints and strings is different than equality. ### Describe alternatives you've considered We already have something called `unwrap_in_cast` that does this transformation @jayzhan211 has moved this code in this PR - https://github.com/apache/datafusion/pull/15012 I think once that PR merged, that would be the natural place to add this optimization ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org