jiaqizho commented on issue #659:
URL: https://github.com/apache/cloudberry/issues/659#issuecomment-2482268565

   my test case is 
   
   ```
   create table t1(v1 int, v2 int, v3 int);
   insert into t1 values(generate_series(0, 100000), generate_series(100000, 
200000), generate_series(200000, 300000));
   insert into t1 select * from t1; -- do n times
   ```
   
   and the ORCA memo is:
   
   ```
   postgres=# explain analyze WITH UniqueIDs AS (
       SELECT DISTINCT v2
       FROM t1
   )
   SELECT COUNT(*)
   FROM UniqueIDs;
   LOG:  statement: explain analyze WITH UniqueIDs AS (
       SELECT DISTINCT v2
       FROM t1
   )
   SELECT COUNT(*)
   FROM UniqueIDs;
   LOG:  2024-11-18 14:15:17:640135 CST,THD000,TRACE,"MEMO after optimization 
(stage 0):
   ",
   2024-11-18 14:15:17:642637 CST,THD000,TRACE,"
   Group 19 ():
     0: CScalarConst (1) [ ]
   
   Group 18 ():
     0: CScalarProjectList [ 17 ]
   
   Group 17 ():
     0: CScalarProjectElement "count" (22) [ 16 ]
   
   Group 16 ():
     0: CScalarAggFunc (count , Distinct: false , Aggregate Stage: Global) [ 15 
1 1 1 ]
   
   Group 15 ():
     0: CScalarValuesList [ 14 ]
   
   Group 14 ():
     0: CScalarIdent "ColRef_0023" (23) [ ]
   
   Group 13 (#GExprs: 3):
     0: CLogicalGbAgg( Local ) Grp Cols: [][Local], Minimal Grp Cols: [], 
Generates Duplicates :[ 1 ]  [ 0 12 ]
     1: CPhysicalScalarAgg( Local, multi-stage ) Grp Cols: [], Minimal Grp 
Cols:[], Generates Duplicates :[ 1 ]  [ 0 12 ]
       Cost Ctxts:
         main ctxt (stage 0)1.0, child ctxts:[1], rows:1.000000 (group), cost: 
1093.902454
         main ctxt (stage 0)1.1, child ctxts:[2], rows:1.000000 (group), cost: 
1094.007472
     2: CPhysicalMotionGather(master) [ 13 ]
       Cost Ctxts:
         main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 
1093.902484
     Grp OptCtxts:
       0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [SINGLETON (master) match: satisfy], req rewind: [], req rewind: 
[NONE NO-MOTION match: satisfy], req partition propagation: [<empty> match: 
satisfy ]) => Best Expr:2
       1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [ANY  EOperatorId: 129  match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:1
   
   Group 12 ():
     0: CScalarProjectList [ 11 ]
   
   Group 11 ():
     0: CScalarProjectElement "ColRef_0023" (23) [ 10 ]
   
   Group 10 ():
     0: CScalarAggFunc (count , Distinct: false , Aggregate Stage: Local) [ 1 1 
1 1 ]
   
   Group 9 (#GExprs: 5):
     0: CLogicalGbAgg( Local ) Grp Cols: ["v2" (11)][Local], Minimal Grp Cols: 
["v2" (11)], Generates Duplicates :[ 1 ]  [ 7 8 ]
     1: CPhysicalStreamAgg( Local, multi-stage ) Grp Cols: ["v2" (11)], Minimal 
Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ]  [ 7 8 ]
       Cost Ctxts:
         main ctxt (stage 0)1.0, child ctxts:[4], rows:100656.000000 (group), 
cost: 2719.140939
         main ctxt (stage 0)1.1, child ctxts:[3], rows:100656.000000 (group), 
cost: 2772.560140
     2: CPhysicalHashAgg( Local, multi-stage ) Grp Cols: ["v2" (11)], Minimal 
Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ]  (High) [ 7 8 ]
       Cost Ctxts:
         main ctxt (stage 0)1.0, child ctxts:[1], rows:100656.000000 (group), 
cost: 1089.433563
         main ctxt (stage 0)1.1, child ctxts:[2], rows:100656.000000 (group), 
cost: 1142.852763
     3: CPhysicalMotionHashDistribute HASHED: [ CScalarIdent "v2" (11), nulls 
colocated ], opfamilies: (1977,1.0), [ 9 ]
       Cost Ctxts:
         main ctxt (stage 0)0.0, child ctxts:[1], rows:100656.000000 (group), 
cost: 1089.853634
     4: CPhysicalSort  ( (97,1.0), "v2" (11), NULLsLast )  [ 9 ]
       Cost Ctxts:
         main ctxt (stage 0)2.0, child ctxts:[0], rows:100656.000000 (group), 
cost: 1101.293981
     Grp OptCtxts:
       0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [HASHED: [ CScalarIdent "v2" (11), nulls colocated ], opfamilies: 
(1977,1.0), match: satisfy], req rewind: [], req rewind: [NONE NO-MOTION match: 
satisfy], req partition propagation: [<empty> match: satisfy ]) => Best Expr:3
       1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [ANY  EOperatorId: 131  match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:2
       2 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2" (11), 
NULLsLast )  match: satisfy ], req dist: [HASHED: [ CScalarIdent "v2" (11), 
nulls colocated ], opfamilies: (1977,1.0), match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:4
   
   Group 8 ():
     0: CScalarProjectList [ ]
   
   Group 7 (#GExprs: 5):
     0: CLogicalGet "t1" ("t1"), Columns: ["v1" (12), "v2" (11), "v3" (13), 
"ctid" (14), "xmin" (15), "cmin" (16), "xmax" (17), "cmax" (18), "tableoid" 
(19), "gp_segment_id" (20), "gp_foreign_server" (21)] Key sets: {[3,9]} [ ]
     1: CPhysicalTableScan "t1" ("t1") [ ]
       Cost Ctxts:
         main ctxt (stage 0)1.0, child ctxts:[], rows:12800128.000000 (group), 
cost: 538.947746
     2: CPhysicalMotionHashDistribute HASHED: [ CScalarIdent "v2" (11), nulls 
colocated ], opfamilies: (1977,1.0), [ 7 ]
       Cost Ctxts:
         main ctxt (stage 0)0.0, child ctxts:[1], rows:12800128.000000 (group), 
cost: 624.111264
     3: CPhysicalMotionRandom [ 7 ]
       Cost Ctxts:
         main ctxt (stage 0)2.0, child ctxts:[1], rows:12800128.000000 (group), 
cost: 624.111264
     4: CPhysicalSort  ( (97,1.0), "v2" (11), NULLsLast )  [ 7 ]
       Cost Ctxts:
         main ctxt (stage 0)4.0, child ctxts:[1], rows:12800128.000000 (group), 
cost: 2701.998811
         main ctxt (stage 0)5.0, child ctxts:[0], rows:12800128.000000 (group), 
cost: 2755.418012
         main ctxt (stage 0)3.0, child ctxts:[2], rows:12800128.000000 (group), 
cost: 2755.418012
     Grp OptCtxts:
       0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [HASHED: [ CScalarIdent "v2" (11), nulls colocated ], opfamilies: 
(1977,1.0), match: satisfy], req rewind: [], req rewind: [NONE NO-MOTION match: 
satisfy], req partition propagation: [<empty> match: satisfy ]) => Best Expr:2
       1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [ANY  EOperatorId: 131  match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:1
       2 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [RANDOM match: satisfy], req rewind: [], req rewind: [NONE 
NO-MOTION match: satisfy], req partition propagation: [<empty> match: satisfy 
]) => Best Expr:3
       3 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2" (11), 
NULLsLast )  match: satisfy ], req dist: [RANDOM match: satisfy], req rewind: 
[], req rewind: [NONE NO-MOTION match: satisfy], req partition propagation: 
[<empty> match: satisfy ]) => Best Expr:4
       4 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2" (11), 
NULLsLast )  match: satisfy ], req dist: [ANY  EOperatorId: 136  match: 
satisfy], req rewind: [], req rewind: [NONE NO-MOTION match: satisfy], req 
partition propagation: [<empty> match: satisfy ]) => Best Expr:4
       5 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2" (11), 
NULLsLast )  match: satisfy ], req dist: [HASHED: [ CScalarIdent "v2" (11), 
nulls colocated ], opfamilies: (1977,1.0), match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:4
   
   ROOT
   Group 6 (#GExprs: 3):
     0: CLogicalCTEAnchor (0) [ 5 ]
     1: CLogicalSelect [ 5 19 ]
     2: CPhysicalFilter [ 5 19 ]
       Cost Ctxts:
         main ctxt (stage 0)0.1, child ctxts:[0], rows:1.000000 (group), cost: 
1093.902485 -- group 5 outer 
     Grp OptCtxts:
       0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [SINGLETON (master) match: satisfy], req rewind: [], req rewind: 
[NONE NO-MOTION match: satisfy], req partition propagation: [<empty> match: 
satisfy ]) => Best Expr:2
   
   Group 5 (#GExprs: 4):
     0: CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], 
Generates Duplicates :[ 0 ]  [ 0 4 ]
     1: CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], 
Generates Duplicates :[ 1 ]  [ 13 18 ]
     2: CPhysicalScalarAgg( Global, multi-stage ) Grp Cols: [], Minimal Grp 
Cols:[], Generates Duplicates :[ 1 ]  [ 13 18 ]
       Cost Ctxts:
         main ctxt (stage 0)0.1, child ctxts:[0], rows:1.000000 (group), cost: 
1093.902485
     3: CPhysicalScalarAgg( Global ) Grp Cols: [], Minimal Grp Cols:[], 
Generates Duplicates :[ 0 ]  [ 0 4 ]
       Cost Ctxts:
         main ctxt (stage 0)0.1, child ctxts:[0], rows:1.000000 (group), cost: 
1094.277565
     Grp OptCtxts:
       0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [ANY  EOperatorId: 101  match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:2
   
   Group 4 ():
     0: CScalarProjectList [ 3 ]
   
   Group 3 ():
     0: CScalarProjectElement "count" (22) [ 2 ]
   
   Group 2 ():
     0: CScalarAggFunc (count , Distinct: false , Aggregate Stage: Global) [ 1 
1 1 1 ]
   
   Group 1 ():
     0: CScalarValuesList [ ]
   
   Group 0 (#GExprs: 10):
     0: CLogicalCTEConsumer (0), Columns: ["v2" (11)] [ ]
     1: CLogicalGbAgg( Global ) Grp Cols: ["v2" (11)][Global], Minimal Grp 
Cols: ["v2" (11)], Generates Duplicates :[ 0 ]  [ 7 8 ]
     2: CLogicalGbAgg( Global ) Grp Cols: ["v2" (11)][Global], Minimal Grp 
Cols: ["v2" (11)], Generates Duplicates :[ 1 ]  [ 9 8 ]
     3: CPhysicalStreamAgg( Global, multi-stage ) Grp Cols: ["v2" (11)], 
Minimal Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ]  [ 9 8 ]
       Cost Ctxts:
         main ctxt (stage 0)1.1, child ctxts:[2], rows:100656.000000 (group), 
cost: 1101.447012
     4: CPhysicalHashAgg( Global, multi-stage ) Grp Cols: ["v2" (11)], Minimal 
Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ]  (High) [ 9 8 ]
       Cost Ctxts:
         main ctxt (stage 0)1.1, child ctxts:[0], rows:100656.000000 (group), 
cost: 1093.902454
     5: CPhysicalStreamAgg( Global ) Grp Cols: ["v2" (11)], Minimal Grp 
Cols:["v2" (11)], Generates Duplicates :[ 0 ]  [ 7 8 ]
       Cost Ctxts:
         main ctxt (stage 0)1.1, child ctxts:[5], rows:100656.000000 (group), 
cost: 2772.503672
     6: CPhysicalHashAgg( Global ) Grp Cols: ["v2" (11)], Minimal Grp 
Cols:["v2" (11)], Generates Duplicates :[ 0 ]  (High) [ 7 8 ]
       Cost Ctxts:
         main ctxt (stage 0)1.1, child ctxts:[0], rows:100656.000000 (group), 
cost: 1136.613079
     7: CPhysicalCTEConsumer (0), Columns: ["v2" (11)] [ ]
       Cost Ctxts:
     8: CPhysicalMotionGather(master) [ 0 ]
       Cost Ctxts:
         main ctxt (stage 0)0.0, child ctxts:[1], rows:100656.000000 (group), 
cost: 1094.277565
     9: CPhysicalMotionRandom [ 0 ]
       Cost Ctxts:
         main ctxt (stage 0)2.0, child ctxts:[1], rows:100656.000000 (group), 
cost: 1094.007472
     Grp OptCtxts:
       0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [SINGLETON (master) match: satisfy], req rewind: [], req rewind: 
[NONE NO-MOTION match: satisfy], req partition propagation: [<empty> match: 
satisfy ]) => Best Expr:8
       1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [ANY  EOperatorId: 129  match: satisfy], req rewind: [], req 
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty> 
match: satisfy ]) => Best Expr:4
       2 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy 
], req dist: [RANDOM match: satisfy], req rewind: [], req rewind: [NONE 
NO-MOTION match: satisfy], req partition propagation: [<empty> match: satisfy 
]) => Best Expr:9
   ",
                                                                                
QUERY PLAN
   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Finalize Aggregate  (cost=0.00..1093.90 rows=1 width=8) (actual 
time=3749.832..3749.845 rows=1 loops=1)
      ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1093.90 rows=1 
width=8) (actual time=3744.143..3749.815 rows=3 loops=1)
            ->  Partial Aggregate  (cost=0.00..1093.90 rows=1 width=8) (actual 
time=3748.275..3748.276 rows=1 loops=1)
                  ->  HashAggregate  (cost=0.00..1093.90 rows=33552 width=1) 
(actual time=3728.835..3738.960 rows=33501 loops=1)
                        Group Key: v2
                        ->  Redistribute Motion 3:3  (slice2; segments: 3)  
(cost=0.00..1089.85 rows=33552 width=4) (actual time=3277.019..3706.719 
rows=33501 loops=1)
                              Hash Key: v2
                              ->  Streaming HashAggregate  (cost=0.00..1089.43 
rows=33552 width=4) (actual time=3694.054..3706.532 rows=33462 loops=1)
                                    Group Key: v2
                                    ->  Seq Scan on t1  (cost=0.00..538.95 
rows=4266710 width=4) (actual time=0.175..1717.717 rows=4283136 loops=1)
    Planning Time: 60.859 ms
      (slice0)    Executor memory: 3139K bytes.
      (slice1)    Executor memory: 2617K bytes avg x 3x(0) workers, 2623K bytes 
max (seg2).  Work_mem: 3601K bytes max.
      (slice2)    Executor memory: 2615K bytes avg x 3x(0) workers, 2619K bytes 
max (seg0).  Work_mem: 3601K bytes max.
    Memory used:  128000kB
    Optimizer: Pivotal Optimizer (GPORCA)
    Execution Time: 3762.223 ms
   (17 rows)
   ```
   
   then the lowest cost path is:
   
   ```
   ROOT -> 5 9 * [0] (CPhysicalFilter)
       -> 5 * 0  (CPhysicalScalarAgg)
           -> 13 * 0  (CPhysicalMotionGather)
               -> 0 * 1 (CPhysicalHashAgg)
                   -> 9 * 0 (CPhysicalMotionHashDistribute)
                     -> 9 * 1 (CPhysicalHashAgg)
                       -> 7 * 1 (CPhysicalTableScan)
   ```
   
   let us see the group 7(only one path to do the seq scan):
   
   ```
   Group 7 (#GExprs: 5):
     ...
     1: CPhysicalTableScan "t1" ("t1") [ ]
       Cost Ctxts:
         main ctxt (stage 0)1.0, child ctxts:[], rows:12800128.000000 (group), 
cost: 538.947746
     ...
     1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy ], 
req dist: [ANY  EOperatorId: 131  match: satisfy], req rewind: [], req rewind: 
[NONE NO-MOTION match: satisfy], req partition propagation: [<empty> match: 
satisfy ]) => Best Expr:1
   
   ```
   
   
   then below path can add `group 7, expr 1` as its child:
   - group 7,expr 2 - CPhysicalMotionHashDistribute
   - group 7,expr 3 - CPhysicalMotionRandom
   - group 7,expr 3 - Sort(high cost)
   - group 9,expr 1 - CPhysicalHashAgg
   
   
   The parent path of path abc `group 9,expr 1` the same as `group 7, expr 1`, 
and only `CPhysicalMotionHashDistribute/CPhysicalMotionRandom/Sort` path 
exists. 
   
   
   So i guess we need add a path in the ORCA...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to