suremarc opened a new issue, #9612:
URL: https://github.com/apache/arrow-datafusion/issues/9612

   ### Is your feature request related to a problem or challenge?
   
   I have a table sorted by `(date, ticker, time)` and would like to query data 
ordered by `(date, time)` assuming `ticker` is known to be constant. 
   
   It seems that if `FilterExec` would emit singleton statistics if not for the 
following check failing:
   
   
https://github.com/apache/arrow-datafusion/blob/3c26e597aeacde0a5e6a51f30394d3d31c6acd96/datafusion/physical-plan/src/filter.rs#L127-L135
   
   which ultimately boils down to this piece of code only allowing numeric 
types:
   
   
https://github.com/apache/arrow-datafusion/blob/3c26e597aeacde0a5e6a51f30394d3d31c6acd96/datafusion/physical-expr/src/intervals/utils.rs#L96-L111
   
   I noticed that if we just add `DataType::Utf8` to the list, it seems to work 
for the example below (in the next section), so my uneducated mind wonders if 
we can just add more datatypes to it? 🤔
   
   ### Describe the solution you'd like
   
   Given the following table:
   
   ```sql
   CREATE EXTERNAL TABLE data (
       "date"   VARCHAR, 
       "ticker" VARCHAR, 
       "time"   VARCHAR,
   ) STORED AS PARQUET
   WITH ORDER ("date", "ticker", "time")
   LOCATION 'example/';
   
   INSERT INTO data VALUES 
   ('2023', 'A', '0400'), 
   ('2023', 'A', '0401'), 
   ('2023', 'B', '0400');
   ```
   
   I would like for the following query to be executed without a sort:
   
   ```sql
   SELECT * FROM data 
   WHERE ticker = 'A' 
   ORDER BY "date", "time";
   -- 
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   -- | plan_type     | plan                                                    
                                                                                
                                                                                
                                                                                
                                                                                
     -- |
   -- 
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   -- | logical_plan  | Sort: data.date ASC NULLS LAST, data.time ASC NULLS 
LAST                                                                            
                                                                                
                                                                                
                                                                                
         -- |
   -- |               |   Filter: data.ticker = Utf8("A")                       
                                                                                
                                                                                
                                                                                
                                                                                
     -- |
   -- |               |     TableScan: data projection=[date, ticker, time], 
partial_filters=[data.ticker = Utf8("A")]                                       
                                                                                
                                                                                
                                                                                
        -- |
   -- | physical_plan | SortPreservingMergeExec: [date@0 ASC NULLS LAST,time@2 
ASC NULLS LAST]                                                                 
                                                                                
                                                                                
                                                                                
      -- |
   -- |               |   SortExec: expr=[date@0 ASC NULLS LAST,time@2 ASC 
NULLS LAST]                                                                     
                                                                                
                                                                                
                                                                                
          -- |
   -- |               |     CoalesceBatchesExec: target_batch_size=8192         
                                                                                
                                                                                
                                                                                
                                                                                
     -- |
   -- |               |       FilterExec: ticker@1 = A                          
                                                                                
                                                                                
                                                                                
                                                                                
     -- |
   -- |               |         RepartitionExec: 
partitioning=RoundRobinBatch(16), input_partitions=1                            
                                                                                
                                                                                
                                                                                
                                    -- |
   -- |               |           ParquetExec: file_groups={1 group: 
[[Users/matthewcramerus/arrow-datafusion/datafusion-cli/example/bGDu3tzuQaHhgpGZ_0.parquet]]},
 projection=[date, ticker, time], output_ordering=[date@0 ASC NULLS LAST, 
ticker@1 ASC NULLS LAST, time@2 ASC NULLS LAST], predicate=ticker@1 = A, 
pruning_predicate=ticker_min@0 <= A AND A <= ticker_max@1, 
required_guarantees=[ticker in (A)] |
   -- |               |                                                         
                                                                                
                                                                                
                                                                                
                                                                                
     -- |
   -- 
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to