jychen7 commented on issue #5969:
URL: 
https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1506065309

   I thought `sort` knows the machine's memory limit and spills when it is 
insufficient. Not sure why OOM yet
   
https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion/core/src/physical_plan/sorts/sort.rs#L63-L72
   
   
   for reference, the plan
   ```
   explain SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), 
AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC 
LIMIT 10;
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Limit: skip=0, fetch=10                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |   Sort: c DESC NULLS FIRST, fetch=10                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |     Projection: hits.WatchID, hits.ClientIP, 
COUNT(UInt8(1)) AS c, SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |       Aggregate: groupBy=[[hits.WatchID, hits.ClientIP]], 
aggr=[[COUNT(UInt8(1)), SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)]]        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |         TableScan: hits projection=[WatchID, ClientIP, 
IsRefresh, ResolutionWidth]                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   | physical_plan | GlobalLimitExec: skip=0, fetch=10                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |   SortPreservingMergeExec: [c@2 DESC]                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |     SortExec: fetch=10, expr=[c@2 DESC]                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |       ProjectionExec: expr=[WatchID@0 as WatchID, 
ClientIP@1 as ClientIP, COUNT(UInt8(1))@2 as c, SUM(hits.IsRefresh)@3 as 
SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)@4 as AVG(hits.ResolutionWidth)]  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |         AggregateExec: mode=FinalPartitioned, 
gby=[WatchID@0 as WatchID, ClientIP@1 as ClientIP], aggr=[COUNT(UInt8(1)), 
SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)]                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |             RepartitionExec: partitioning=Hash([Column { 
name: "WatchID", index: 0 }, Column { name: "ClientIP", index: 1 }], 12), 
input_partitions=12                                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |               AggregateExec: mode=Partial, gby=[WatchID@0 
as WatchID, ClientIP@1 as ClientIP], aggr=[COUNT(UInt8(1)), 
SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)]                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   |               |                 ParquetExec: limit=None, partitions={12 
groups: [[hits.parquet:0..1231664704], [hits.parquet:1231664704..2463329408], 
[hits.parquet:2463329408..3694994112], [hits.parquet:3694994112..4926658816], 
[hits.parquet:4926658816..6158323520], [hits.parquet:6158323520..7389988224], 
[hits.parquet:7389988224..8621652928], [hits.parquet:8621652928..9853317632], 
[hits.parquet:9853317632..11084982336], 
[hits.parquet:11084982336..12316647040], 
[hits.parquet:12316647040..13548311744], 
[hits.parquet:13548311744..14779976446]]}, projection=[WatchID, ClientIP, 
IsRefresh, ResolutionWidth] |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                            |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to