aeschbacherEUF commented on issue #659: URL: https://github.com/apache/cloudberry/issues/659#issuecomment-2461465573
Hi Team, Eddie provided us with a test RPM package containing the single fix for issue #676 (already cherry-picked and built into v1.6.0): [Implement 3-phase aggregation with DEDUP HashAgg for DISTINCT. #676](https://github.com/apache/cloudberry/pull/676) We conducted some real-data testing, hoping for better performance with optimizer=on (needed for our typical workload) and gp_eager_distinct_dedup=on. Unfortunately, the PostgreSQL optimizer still outperforms ORCA by a factor of two. We've attached the EXPLAIN ANALYZE output for your reference. [fix_659_distinct_cdb161.txt](https://github.com/user-attachments/files/17657284/fix_659_distinct_cdb161.txt) `set gp_eager_distinct_dedup = on; set optimizer = on; set optimizer_enable_multiple_distinct_aggs = off; explain (analyze, verbose, costs, timing, buffers) select count(distinct gsp_id) from eft_com_call_statistic; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=0.00..1252.01 rows=1 width=8) (actual time=9613.853..9613.854 rows=1 loops=1) Output: count(DISTINCT gsp_id) -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..1252.01 rows=1 width=8) (actual time=9408.391..9613.841 rows=2 loops=1) Output: (PARTIAL count(DISTINCT gsp_id)) -> Partial Aggregate (cost=0.00..1252.01 rows=1 width=8) (actual time=9612.691..9612.693 rows=1 loops=1) Output: PARTIAL count(DISTINCT gsp_id) -> Seq Scan on ebi_data.eft_com_call_statistic (cost=0.00..1103.70 rows=3958250 width=19) (actual time=0.222..940.759 rows=3958373 loops=1) Output: gsp_id Settings: effective_cache_size = '8GB', optimizer = 'on' Planning: Buffers: shared hit=2 Planning Time: 5.153 ms (slice0) Executor memory: 37K bytes. (slice1) Executor memory: 130705K bytes avg x 2x(0) workers, 130705K bytes max (seg0). Memory used: 131072kB Optimizer: Pivotal Optimizer (GPORCA) Execution Time: 9614.687 ms (17 rows) Time: 9620,598 ms (00:09,621)` `set gp_eager_distinct_dedup = on; set optimizer = on; set optimizer_enable_multiple_distinct_aggs = off; explain (analyze, verbose, costs, timing, buffers) WITH cte_gsp_counts AS ( SELECT DISTINCT gsp_id FROM eft_com_call_statistic ) SELECT COUNT(*) FROM cte_gsp_counts; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=0.00..1721.24 rows=1 width=8) (actual time=3438.953..3438.955 rows=1 loops=1) Output: count() -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..1721.24 rows=1 width=8) (actual time=3423.515..3438.943 rows=2 loops=1) Output: (PARTIAL count()) -> Partial Aggregate (cost=0.00..1721.24 rows=1 width=8) (actual time=3433.644..3433.645 rows=1 loops=1) Output: PARTIAL count() -> HashAggregate (cost=0.00..1721.24 rows=3958250 width=1) (actual time=1993.244..3310.569 rows=3958365 loops=1) Output: gsp_id Group Key: eft_com_call_statistic.gsp_id Planned Partitions: 4 work_mem: 262433kB Segments: 2 Max: 131217kB (segment 0) Workfile: (2 spilling) -> Seq Scan on ebi_data.eft_com_call_statistic (cost=0.00..1103.70 rows=3958250 width=19) (actual time=0.376..915.290 rows=3958373 loops=1) Output: gsp_id Settings: effective_cache_size = '8GB', optimizer = 'on' Planning: Buffers: shared hit=6 Planning Time: 4.713 ms (slice0) Executor memory: 24626K bytes. (slice1) Executor memory: 123097K bytes avg x 2x(0) workers, 123117K bytes max (seg0). Work_mem: 131217K bytes max. Memory used: 131072kB Optimizer: Pivotal Optimizer (GPORCA) Execution Time: 3448.772 ms (22 rows) Time: 3454,306 ms (00:03,454)` Best regards, Andreas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
