fanfuxiaoran commented on PR #685:
URL: https://github.com/apache/cloudberry/pull/685#issuecomment-2500089120

   > > I took a look at orca, it has already optimized `distinct` function.
   > > ```
   > > explain  select  distinct(count(a)) from foo;
   > >                                      QUERY PLAN
   > > 
------------------------------------------------------------------------------------
   > >  Finalize Aggregate  (cost=0.00..526.96 rows=1 width=8)
   > >    ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..526.96 
rows=1 width=8)
   > >          ->  Partial Aggregate  (cost=0.00..526.96 rows=1 width=8)
   > >                ->  Seq Scan on foo  (cost=0.00..500.67 rows=3333334 
width=4)
   > >  Optimizer: Pivotal Optimizer (GPORCA)
   > > (5 rows)
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > Even if with `group by` , the `distinct` also can be removed
   > > ```
   > > explain  select  distinct(count(a)) from foo group by a ;
   > >                                                        QUERY PLAN
   > > 
------------------------------------------------------------------------------------------------------------------------
   > >  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1395.69 rows=1000 
width=8)
   > >    ->  HashAggregate  (cost=0.00..1395.66 rows=334 width=8)
   > >          Group Key: (count(a))
   > >          ->  Redistribute Motion 3:3  (slice2; segments: 3)  
(cost=0.00..1395.62 rows=334 width=8)
   > >                Hash Key: (count(a))
   > >                ->  Streaming HashAggregate  (cost=0.00..1395.61 rows=334 
width=8)
   > >                      Group Key: count(a)
   > >                      ->  HashAggregate  (cost=0.00..985.15 rows=3333334 
width=8)
   > >                            Group Key: a
   > >                            Planned Partitions: 16
   > >                            ->  Redistribute Motion 3:3  (slice3; 
segments: 3)  (cost=0.00..567.20 rows=3333334 width=4)
   > >                                  Hash Key: a
   > >                                  ->  Seq Scan on foo  (cost=0.00..500.67 
rows=3333334 width=4)
   > >  Optimizer: Pivotal Optimizer (GPORCA)
   > > (14 rows)
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > as distinct is a function which only works in a group.
   > > The function called `PexprRemoveSuperfluousDistinctInDQA` in orca.
   > 
   > Yeah, see [#677 (reply in 
thread)](https://github.com/apache/cloudberry/discussions/677#discussioncomment-10966471)
   
   Orca removed the distinct expression when it is used on the agg expression 
even if there is group by clause, do we need to consider that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to