[I] [CH] Move `project` after `sort` [incubator-gluten]

via GitHub Tue, 11 Jun 2024 01:27:42 -0700


lgbo-ustc opened a new issue, #6045:
URL: https://github.com/apache/incubator-gluten/issues/6045


   ### Description
   
   Found a interesting case
   ```sql
   explain select n_regionkey, n_nationkey, n_nationkey * 2 as x  from 
tpch_pq.nation order by n_nationkey;
   ```
   The physical plan from `gluten` is 
   ```
   CHNativeColumnarToRow
   +- ^(16) SortExecTransformer [n_nationkey#0L ASC NULLS FIRST], true, 0
      +- ^(16) InputIteratorTransformer[n_regionkey#2L, n_nationkey#0L, x#64L]
         +- ColumnarExchange rangepartitioning(n_nationkey#0L ASC NULLS FIRST, 
5), ENSURE_REQUIREMENTS, [plan_id=529], [id=#529], [OUTPUT] 
List(n_regionkey:LongType, n_nationkey:LongType, x:LongType)
            +- ^(15) ProjectExecTransformer [n_regionkey#2L, n_nationkey#0L, 
(n_nationkey#0L * cast(2 as bigint)) AS x#64L]
               +- ^(15) NativeFileScan parquet 
tpch_pq.nation[n_nationkey#0L,n_regionkey#2L] Batched: true, DataFilters: [], 
Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/home/liangjiabiao/workspace/docker/local_gluten/tpch_pq_data/nat...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<n_nationkey:bigint,n_regionkey:bigint>
   ```
   The project action for `n_nationkey * 2 as x ` is before `sort`.
   
   Let's see a similar case in `CH`, the column `a + 1` is generated lazily 
after sort.
   ```
   f2386dc7dd0d :) explain pipeline header=1 select key, a, a + 1  from tt1 
order by  key
   
   EXPLAIN PIPELINE header = 1
   SELECT
       key,
       a,
       a + 1
   FROM tt1
   ORDER BY key ASC
   
   Query id: a47b482c-3b15-4718-a7ce-1b63403ee192
   
       
┌─explain────────────────────────────────────────────────────────────────┐
    1. │ (Expression)                                                           
│
    2. │ ExpressionTransform                                                    
│
    3. │ Header: key UInt32: key UInt32 UInt32(size = 0)                        
│
    4. │         a UInt32: a UInt32 UInt32(size = 0)                            
│
    5. │         plus(a, 1) UInt64: plus(a, 1) UInt64 UInt64(size = 0)          
│
    6. │   (Sorting)                                                            
│
    7. │     (Expression)                                                       
│
    8. │     ExpressionTransform                                                
│
    9. │     Header: __table1.key UInt32: __table1.key UInt32 UInt32(size = 0)  
│
   10. │             a UInt32: a UInt32 UInt32(size = 0)                        
│
   11. │       (ReadFromMergeTree)                                              
│
   12. │       MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder) 0 → 1 
│
   13. │       Header: key UInt32: key UInt32 UInt32(size = 0)                  
│
   14. │               a UInt32: a UInt32 UInt32(size = 0)                      
│
       
└────────────────────────────────────────────────────────────────────────┘
   
   14 rows in set. Elapsed: 0.001 sec.
   ```
   
   In column-based sorting, more columns need to be sorted, the performance is 
worse. If new columns are not used as sort keys, generate them after `sort` 
should be a good idea.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [CH] Move `project` after `sort` [incubator-gluten]

Reply via email to