suremarc opened a new issue, #9612: URL: https://github.com/apache/arrow-datafusion/issues/9612
### Is your feature request related to a problem or challenge? I have a table sorted by `(date, ticker, time)` and would like to query data ordered by `(date, time)` assuming `ticker` is known to be constant. It seems that if `FilterExec` would emit singleton statistics if not for the following check failing: https://github.com/apache/arrow-datafusion/blob/3c26e597aeacde0a5e6a51f30394d3d31c6acd96/datafusion/physical-plan/src/filter.rs#L127-L135 which ultimately boils down to this piece of code only allowing numeric types: https://github.com/apache/arrow-datafusion/blob/3c26e597aeacde0a5e6a51f30394d3d31c6acd96/datafusion/physical-expr/src/intervals/utils.rs#L96-L111 I noticed that if we just add `DataType::Utf8` to the list, it seems to work for the example below (in the next section), so my uneducated mind wonders if we can just add more datatypes to it? 🤔 ### Describe the solution you'd like Given the following table: ```sql CREATE EXTERNAL TABLE data ( "date" VARCHAR, "ticker" VARCHAR, "time" VARCHAR, ) STORED AS PARQUET WITH ORDER ("date", "ticker", "time") LOCATION 'example/'; INSERT INTO data VALUES ('2023', 'A', '0400'), ('2023', 'A', '0401'), ('2023', 'B', '0400'); ``` I would like for the following query to be executed without a sort: ```sql SELECT * FROM data WHERE ticker = 'A' ORDER BY "date", "time"; -- +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -- | plan_type | plan -- | -- +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -- | logical_plan | Sort: data.date ASC NULLS LAST, data.time ASC NULLS LAST -- | -- | | Filter: data.ticker = Utf8("A") -- | -- | | TableScan: data projection=[date, ticker, time], partial_filters=[data.ticker = Utf8("A")] -- | -- | physical_plan | SortPreservingMergeExec: [date@0 ASC NULLS LAST,time@2 ASC NULLS LAST] -- | -- | | SortExec: expr=[date@0 ASC NULLS LAST,time@2 ASC NULLS LAST] -- | -- | | CoalesceBatchesExec: target_batch_size=8192 -- | -- | | FilterExec: ticker@1 = A -- | -- | | RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1 -- | -- | | ParquetExec: file_groups={1 group: [[Users/matthewcramerus/arrow-datafusion/datafusion-cli/example/bGDu3tzuQaHhgpGZ_0.parquet]]}, projection=[date, ticker, time], output_ordering=[date@0 ASC NULLS LAST, ticker@1 ASC NULLS LAST, time@2 ASC NULLS LAST], predicate=ticker@1 = A, pruning_predicate=ticker_min@0 <= A AND A <= ticker_max@1, required_guarantees=[ticker in (A)] | -- | | -- | -- +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
