alamb commented on issue #8860:
URL: 
https://github.com/apache/arrow-datafusion/issues/8860#issuecomment-1891045590

   > ah, okay. I was hoping some queries were similar enough to be accelerated 
by that code, given a little effort, but it does not appear that way.
   
   Indeed -- from my perspective the ClickBench queries can not be changed (as 
we don't define them, they come from the clickbench benchmark itself). That is 
why I propose adding an extended version here. 
   
   What I would like to do is to come up with some queries that do have the 
pattern that is accelerated that use the same data.  What about something like:
   
   "Find the top 10 most recently active users"
   
   ```sql
   WITH hits 
   AS® (SELECT "UserID", "EventDate"::INT::DATE::TIMESTAMP + 
arrow_cast("EventTime", 'Interval(DayTime)') as time from 'hits.parquet') 
   SELECT min(time), "UserID" from hits GROUP BY "UserID" ORDER BY min(time)  
limit 10;
   ```
   
   Though this query appears not to use the optimization 🤔 
   
   ```sql
   ❯ explain WITH hits AS (SELECT "UserID", "EventDate"::INT::DATE::TIMESTAMP + 
arrow_cast("EventTime", 'Interval(DayTime)') as time from 'hits.parquet') 
SELECT min(time), "UserID" from hits GROUP BY "UserID" ORDER BY min(time)  
limit 10;
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Limit: skip=0, fetch=10                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   |               |   Sort: MIN(hits.time) ASC NULLS LAST, fetch=10            
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   |               |     Projection: MIN(hits.time), hits.UserID                
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   |               |       Aggregate: groupBy=[[hits.UserID]], 
aggr=[[MIN(hits.time)]]                                                         
                                                                                
                                                                                
                                                                                
                                                              |
   |               |         SubqueryAlias: hits                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   |               |           Projection: hits.parquet.UserID, 
CAST(CAST(CAST(hits.parquet.EventDate AS Int32) AS Date32) AS 
Timestamp(Nanosecond, None)) + CAST(hits.parquet.EventTime AS 
Interval(DayTime)) AS time                                                      
                                                                                
                                                                                
                 |
   |               |             TableScan: hits.parquet projection=[EventTime, 
EventDate, UserID]                                                              
                                                                                
                                                                                
                                                                                
                                             |
   | physical_plan | GlobalLimitExec: skip=0, fetch=10                          
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   |               |   SortPreservingMergeExec: [MIN(hits.time)@0 ASC NULLS 
LAST], fetch=10                                                                 
                                                                                
                                                                                
                                                                                
                                                 |
   |               |     SortExec: TopK(fetch=10), expr=[MIN(hits.time)@0 ASC 
NULLS LAST]                                                                     
                                                                                
                                                                                
                                                                                
                                               |
   |               |       ProjectionExec: expr=[MIN(hits.time)@1 as 
MIN(hits.time), UserID@0 as UserID]                                             
                                                                                
                                                                                
                                                                                
                                                        |
   |               |         AggregateExec: mode=FinalPartitioned, 
gby=[UserID@0 as UserID], aggr=[MIN(hits.time)]                                 
                                                                                
                                                                                
                                                                                
                                                          |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   |               |             RepartitionExec: partitioning=Hash([UserID@0], 
16), input_partitions=16                                                        
                                                                                
                                                                                
                                                                                
                                             |
   |               |               AggregateExec: mode=Partial, gby=[UserID@0 
as UserID], aggr=[MIN(hits.time)]                                               
                                                                                
                                                                                
                                                                                
                                               |
   |               |                 ProjectionExec: expr=[UserID@2 as UserID, 
CAST(CAST(CAST(EventDate@1 AS Int32) AS Date32) AS Timestamp(Nanosecond, None)) 
+ CAST(EventTime@0 AS Interval(DayTime)) as time]                               
                                                                                
                                                                                
                                              |
   |               |                   ParquetExec: file_groups={16 groups: 
[[Users/andrewlamb/Downloads/hits.parquet:0..923748528], 
[Users/andrewlamb/Downloads/hits.parquet:923748528..1847497056], 
[Users/andrewlamb/Downloads/hits.parquet:1847497056..2771245584], 
[Users/andrewlamb/Downloads/hits.parquet:2771245584..3694994112], 
[Users/andrewlamb/Downloads/hits.parquet:3694994112..4618742640], ...]}, 
projection=[EventTime, EventDate, UserID] |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.048 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to