my-ship-it commented on issue #659:
URL: https://github.com/apache/cloudberry/issues/659#issuecomment-2484521402
> my test case is
>
> ```
> create table t1(v1 int, v2 int, v3 int);
> insert into t1 values(generate_series(0, 100000), generate_series(100000,
200000), generate_series(200000, 300000));
> insert into t1 select * from t1; -- do n times
> ```
>
> and the ORCA memo is:
>
> ```
> postgres=# explain analyze WITH UniqueIDs AS (
> SELECT DISTINCT v2
> FROM t1
> )
> SELECT COUNT(*)
> FROM UniqueIDs;
> LOG: statement: explain analyze WITH UniqueIDs AS (
> SELECT DISTINCT v2
> FROM t1
> )
> SELECT COUNT(*)
> FROM UniqueIDs;
> LOG: 2024-11-18 14:15:17:640135 CST,THD000,TRACE,"MEMO after optimization
(stage 0):
> ",
> 2024-11-18 14:15:17:642637 CST,THD000,TRACE,"
> Group 19 ():
> 0: CScalarConst (1) [ ]
>
> Group 18 ():
> 0: CScalarProjectList [ 17 ]
>
> Group 17 ():
> 0: CScalarProjectElement "count" (22) [ 16 ]
>
> Group 16 ():
> 0: CScalarAggFunc (count , Distinct: false , Aggregate Stage: Global) [
15 1 1 1 ]
>
> Group 15 ():
> 0: CScalarValuesList [ 14 ]
>
> Group 14 ():
> 0: CScalarIdent "ColRef_0023" (23) [ ]
>
> Group 13 (#GExprs: 3):
> 0: CLogicalGbAgg( Local ) Grp Cols: [][Local], Minimal Grp Cols: [],
Generates Duplicates :[ 1 ] [ 0 12 ]
> 1: CPhysicalScalarAgg( Local, multi-stage ) Grp Cols: [], Minimal Grp
Cols:[], Generates Duplicates :[ 1 ] [ 0 12 ]
> Cost Ctxts:
> main ctxt (stage 0)1.0, child ctxts:[1], rows:1.000000 (group),
cost: 1093.902454
> main ctxt (stage 0)1.1, child ctxts:[2], rows:1.000000 (group),
cost: 1094.007472
> 2: CPhysicalMotionGather(master) [ 13 ]
> Cost Ctxts:
> main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group),
cost: 1093.902484
> Grp OptCtxts:
> 0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [SINGLETON (master) match: satisfy], req rewind: [], req
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty>
match: satisfy ]) => Best Expr:2
> 1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [ANY EOperatorId: 129 match: satisfy], req rewind: [],
req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:1
>
> Group 12 ():
> 0: CScalarProjectList [ 11 ]
>
> Group 11 ():
> 0: CScalarProjectElement "ColRef_0023" (23) [ 10 ]
>
> Group 10 ():
> 0: CScalarAggFunc (count , Distinct: false , Aggregate Stage: Local) [ 1
1 1 1 ]
>
> Group 9 (#GExprs: 5):
> 0: CLogicalGbAgg( Local ) Grp Cols: ["v2" (11)][Local], Minimal Grp
Cols: ["v2" (11)], Generates Duplicates :[ 1 ] [ 7 8 ]
> 1: CPhysicalStreamAgg( Local, multi-stage ) Grp Cols: ["v2" (11)],
Minimal Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ] [ 7 8 ]
> Cost Ctxts:
> main ctxt (stage 0)1.0, child ctxts:[4], rows:100656.000000 (group),
cost: 2719.140939
> main ctxt (stage 0)1.1, child ctxts:[3], rows:100656.000000 (group),
cost: 2772.560140
> 2: CPhysicalHashAgg( Local, multi-stage ) Grp Cols: ["v2" (11)], Minimal
Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ] (High) [ 7 8 ]
> Cost Ctxts:
> main ctxt (stage 0)1.0, child ctxts:[1], rows:100656.000000 (group),
cost: 1089.433563
> main ctxt (stage 0)1.1, child ctxts:[2], rows:100656.000000 (group),
cost: 1142.852763
> 3: CPhysicalMotionHashDistribute HASHED: [ CScalarIdent "v2" (11), nulls
colocated ], opfamilies: (1977,1.0), [ 9 ]
> Cost Ctxts:
> main ctxt (stage 0)0.0, child ctxts:[1], rows:100656.000000 (group),
cost: 1089.853634
> 4: CPhysicalSort ( (97,1.0), "v2" (11), NULLsLast ) [ 9 ]
> Cost Ctxts:
> main ctxt (stage 0)2.0, child ctxts:[0], rows:100656.000000 (group),
cost: 1101.293981
> Grp OptCtxts:
> 0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [HASHED: [ CScalarIdent "v2" (11), nulls colocated ],
opfamilies: (1977,1.0), match: satisfy], req rewind: [], req rewind: [NONE
NO-MOTION match: satisfy], req partition propagation: [<empty> match: satisfy
]) => Best Expr:3
> 1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [ANY EOperatorId: 131 match: satisfy], req rewind: [],
req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:2
> 2 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2"
(11), NULLsLast ) match: satisfy ], req dist: [HASHED: [ CScalarIdent "v2"
(11), nulls colocated ], opfamilies: (1977,1.0), match: satisfy], req rewind:
[], req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:4
>
> Group 8 ():
> 0: CScalarProjectList [ ]
>
> Group 7 (#GExprs: 5):
> 0: CLogicalGet "t1" ("t1"), Columns: ["v1" (12), "v2" (11), "v3" (13),
"ctid" (14), "xmin" (15), "cmin" (16), "xmax" (17), "cmax" (18), "tableoid"
(19), "gp_segment_id" (20), "gp_foreign_server" (21)] Key sets: {[3,9]} [ ]
> 1: CPhysicalTableScan "t1" ("t1") [ ]
> Cost Ctxts:
> main ctxt (stage 0)1.0, child ctxts:[], rows:12800128.000000
(group), cost: 538.947746
> 2: CPhysicalMotionHashDistribute HASHED: [ CScalarIdent "v2" (11), nulls
colocated ], opfamilies: (1977,1.0), [ 7 ]
> Cost Ctxts:
> main ctxt (stage 0)0.0, child ctxts:[1], rows:12800128.000000
(group), cost: 624.111264
> 3: CPhysicalMotionRandom [ 7 ]
> Cost Ctxts:
> main ctxt (stage 0)2.0, child ctxts:[1], rows:12800128.000000
(group), cost: 624.111264
> 4: CPhysicalSort ( (97,1.0), "v2" (11), NULLsLast ) [ 7 ]
> Cost Ctxts:
> main ctxt (stage 0)4.0, child ctxts:[1], rows:12800128.000000
(group), cost: 2701.998811
> main ctxt (stage 0)5.0, child ctxts:[0], rows:12800128.000000
(group), cost: 2755.418012
> main ctxt (stage 0)3.0, child ctxts:[2], rows:12800128.000000
(group), cost: 2755.418012
> Grp OptCtxts:
> 0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [HASHED: [ CScalarIdent "v2" (11), nulls colocated ],
opfamilies: (1977,1.0), match: satisfy], req rewind: [], req rewind: [NONE
NO-MOTION match: satisfy], req partition propagation: [<empty> match: satisfy
]) => Best Expr:2
> 1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [ANY EOperatorId: 131 match: satisfy], req rewind: [],
req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:1
> 2 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [RANDOM match: satisfy], req rewind: [], req rewind: [NONE
NO-MOTION match: satisfy], req partition propagation: [<empty> match: satisfy
]) => Best Expr:3
> 3 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2"
(11), NULLsLast ) match: satisfy ], req dist: [RANDOM match: satisfy], req
rewind: [], req rewind: [NONE NO-MOTION match: satisfy], req partition
propagation: [<empty> match: satisfy ]) => Best Expr:4
> 4 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2"
(11), NULLsLast ) match: satisfy ], req dist: [ANY EOperatorId: 136 match:
satisfy], req rewind: [], req rewind: [NONE NO-MOTION match: satisfy], req
partition propagation: [<empty> match: satisfy ]) => Best Expr:4
> 5 (stage 0): (req CTEs: [0:p(opt) ], req order: [( (97,1.0), "v2"
(11), NULLsLast ) match: satisfy ], req dist: [HASHED: [ CScalarIdent "v2"
(11), nulls colocated ], opfamilies: (1977,1.0), match: satisfy], req rewind:
[], req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:4
>
> ROOT
> Group 6 (#GExprs: 3):
> 0: CLogicalCTEAnchor (0) [ 5 ]
> 1: CLogicalSelect [ 5 19 ]
> 2: CPhysicalFilter [ 5 19 ]
> Cost Ctxts:
> main ctxt (stage 0)0.1, child ctxts:[0], rows:1.000000 (group),
cost: 1093.902485 -- group 5 outer
> Grp OptCtxts:
> 0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [SINGLETON (master) match: satisfy], req rewind: [], req
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty>
match: satisfy ]) => Best Expr:2
>
> Group 5 (#GExprs: 4):
> 0: CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [],
Generates Duplicates :[ 0 ] [ 0 4 ]
> 1: CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [],
Generates Duplicates :[ 1 ] [ 13 18 ]
> 2: CPhysicalScalarAgg( Global, multi-stage ) Grp Cols: [], Minimal Grp
Cols:[], Generates Duplicates :[ 1 ] [ 13 18 ]
> Cost Ctxts:
> main ctxt (stage 0)0.1, child ctxts:[0], rows:1.000000 (group),
cost: 1093.902485
> 3: CPhysicalScalarAgg( Global ) Grp Cols: [], Minimal Grp Cols:[],
Generates Duplicates :[ 0 ] [ 0 4 ]
> Cost Ctxts:
> main ctxt (stage 0)0.1, child ctxts:[0], rows:1.000000 (group),
cost: 1094.277565
> Grp OptCtxts:
> 0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [ANY EOperatorId: 101 match: satisfy], req rewind: [],
req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:2
>
> Group 4 ():
> 0: CScalarProjectList [ 3 ]
>
> Group 3 ():
> 0: CScalarProjectElement "count" (22) [ 2 ]
>
> Group 2 ():
> 0: CScalarAggFunc (count , Distinct: false , Aggregate Stage: Global) [
1 1 1 1 ]
>
> Group 1 ():
> 0: CScalarValuesList [ ]
>
> Group 0 (#GExprs: 10):
> 0: CLogicalCTEConsumer (0), Columns: ["v2" (11)] [ ]
> 1: CLogicalGbAgg( Global ) Grp Cols: ["v2" (11)][Global], Minimal Grp
Cols: ["v2" (11)], Generates Duplicates :[ 0 ] [ 7 8 ]
> 2: CLogicalGbAgg( Global ) Grp Cols: ["v2" (11)][Global], Minimal Grp
Cols: ["v2" (11)], Generates Duplicates :[ 1 ] [ 9 8 ]
> 3: CPhysicalStreamAgg( Global, multi-stage ) Grp Cols: ["v2" (11)],
Minimal Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ] [ 9 8 ]
> Cost Ctxts:
> main ctxt (stage 0)1.1, child ctxts:[2], rows:100656.000000 (group),
cost: 1101.447012
> 4: CPhysicalHashAgg( Global, multi-stage ) Grp Cols: ["v2" (11)],
Minimal Grp Cols:["v2" (11)], Generates Duplicates :[ 1 ] (High) [ 9 8 ]
> Cost Ctxts:
> main ctxt (stage 0)1.1, child ctxts:[0], rows:100656.000000 (group),
cost: 1093.902454
> 5: CPhysicalStreamAgg( Global ) Grp Cols: ["v2" (11)], Minimal Grp
Cols:["v2" (11)], Generates Duplicates :[ 0 ] [ 7 8 ]
> Cost Ctxts:
> main ctxt (stage 0)1.1, child ctxts:[5], rows:100656.000000 (group),
cost: 2772.503672
> 6: CPhysicalHashAgg( Global ) Grp Cols: ["v2" (11)], Minimal Grp
Cols:["v2" (11)], Generates Duplicates :[ 0 ] (High) [ 7 8 ]
> Cost Ctxts:
> main ctxt (stage 0)1.1, child ctxts:[0], rows:100656.000000 (group),
cost: 1136.613079
> 7: CPhysicalCTEConsumer (0), Columns: ["v2" (11)] [ ]
> Cost Ctxts:
> 8: CPhysicalMotionGather(master) [ 0 ]
> Cost Ctxts:
> main ctxt (stage 0)0.0, child ctxts:[1], rows:100656.000000 (group),
cost: 1094.277565
> 9: CPhysicalMotionRandom [ 0 ]
> Cost Ctxts:
> main ctxt (stage 0)2.0, child ctxts:[1], rows:100656.000000 (group),
cost: 1094.007472
> Grp OptCtxts:
> 0 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [SINGLETON (master) match: satisfy], req rewind: [], req
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty>
match: satisfy ]) => Best Expr:8
> 1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [ANY EOperatorId: 129 match: satisfy], req rewind: [],
req rewind: [NONE NO-MOTION match: satisfy], req partition propagation:
[<empty> match: satisfy ]) => Best Expr:4
> 2 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match:
satisfy ], req dist: [RANDOM match: satisfy], req rewind: [], req rewind: [NONE
NO-MOTION match: satisfy], req partition propagation: [<empty> match: satisfy
]) => Best Expr:9
> ",
>
QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=0.00..1093.90 rows=1 width=8) (actual
time=3749.832..3749.845 rows=1 loops=1)
> -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1093.90
rows=1 width=8) (actual time=3744.143..3749.815 rows=3 loops=1)
> -> Partial Aggregate (cost=0.00..1093.90 rows=1 width=8)
(actual time=3748.275..3748.276 rows=1 loops=1)
> -> HashAggregate (cost=0.00..1093.90 rows=33552 width=1)
(actual time=3728.835..3738.960 rows=33501 loops=1)
> Group Key: v2
> -> Redistribute Motion 3:3 (slice2; segments: 3)
(cost=0.00..1089.85 rows=33552 width=4) (actual time=3277.019..3706.719
rows=33501 loops=1)
> Hash Key: v2
> -> Streaming HashAggregate
(cost=0.00..1089.43 rows=33552 width=4) (actual time=3694.054..3706.532
rows=33462 loops=1)
> Group Key: v2
> -> Seq Scan on t1 (cost=0.00..538.95
rows=4266710 width=4) (actual time=0.175..1717.717 rows=4283136 loops=1)
> Planning Time: 60.859 ms
> (slice0) Executor memory: 3139K bytes.
> (slice1) Executor memory: 2617K bytes avg x 3x(0) workers, 2623K
bytes max (seg2). Work_mem: 3601K bytes max.
> (slice2) Executor memory: 2615K bytes avg x 3x(0) workers, 2619K
bytes max (seg0). Work_mem: 3601K bytes max.
> Memory used: 128000kB
> Optimizer: Pivotal Optimizer (GPORCA)
> Execution Time: 3762.223 ms
> (17 rows)
> ```
>
> then the lowest cost path is:
>
> ```
> ROOT -> 5 9 * [0] (CPhysicalFilter)
> -> 5 * 0 (CPhysicalScalarAgg)
> -> 13 * 0 (CPhysicalMotionGather)
> -> 0 * 1 (CPhysicalHashAgg)
> -> 9 * 0 (CPhysicalMotionHashDistribute)
> -> 9 * 1 (CPhysicalHashAgg)
> -> 7 * 1 (CPhysicalTableScan)
> ```
>
> let us see the group 7(only one path to do the seq scan):
>
> ```
> Group 7 (#GExprs: 5):
> ...
> 1: CPhysicalTableScan "t1" ("t1") [ ]
> Cost Ctxts:
> main ctxt (stage 0)1.0, child ctxts:[], rows:12800128.000000
(group), cost: 538.947746
> ...
> 1 (stage 0): (req CTEs: [0:p(opt) ], req order: [<empty> match: satisfy
], req dist: [ANY EOperatorId: 131 match: satisfy], req rewind: [], req
rewind: [NONE NO-MOTION match: satisfy], req partition propagation: [<empty>
match: satisfy ]) => Best Expr:1
> ```
>
> then below path can add `group 7, expr 1` as its child:
>
> * group 7,expr 2 - CPhysicalMotionHashDistribute
> * group 7,expr 3 - CPhysicalMotionRandom
> * group 7,expr 3 - Sort(high cost)
> * group 9,expr 1 - CPhysicalHashAgg
>
> The parent path of path abc `group 9,expr 1` the same as `group 7, expr
1`, and only `CPhysicalMotionHashDistribute/CPhysicalMotionRandom/Sort` path
exists.
>
> So i guess we need add a path in the ORCA...
Good analysis, yes, I think so...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]