Re: [PR] Do not call gporca for simple queries [cloudberry]

via GitHub Tue, 11 Feb 2025 00:28:18 -0800


leborchuk commented on PR #900:
URL: https://github.com/apache/cloudberry/pull/900#issuecomment-2650110476


   > Hi, thanks for your pointing it out, but I have significant concerns about 
this approach and implications.
   > 
   > Defining what constitutes a "simple" query is inherently subjective and 
context-dependent. The current implementation attempts to formalize this, but 
it feels overly simplistic and overlooks critical cases like subqueries.The 
choice of optimizer is fundamentally a tuning decision that should remain in 
the hands of the user or application, not the kernel.
   > 
   > Additionally, this PR lacks sufficient data-driven justification. Without 
detailed performance metrics—such as where ORCA is slower for "simple" queries, 
the magnitude of the performance gap due to the bug below:
   > 
   > ```sql
   > gpadmin=# explain(analyze) insert into test values(1);
   >                                               QUERY PLAN
   > 
------------------------------------------------------------------------------------------------------
   >  Insert on test  (cost=0.00..0.01 rows=1 width=4) (actual 
time=0.298..0.300 rows=0 loops=1)
   >    ->  Result  (cost=0.00..0.00 rows=1 width=8) (actual time=0.017..0.019 
rows=1 loops=1)
   >          ->  Result  (cost=0.00..0.00 rows=1 width=4) (actual 
time=0.016..0.017 rows=1 loops=1)
   >                ->  Result  (cost=0.00..0.00 rows=1 width=1) (actual 
time=0.007..0.008 rows=1 loops=1)
   >  Planning Time: 29.739 ms
   >    (slice0)    Executor memory: 111K bytes (seg1).
   >  Memory used:  128000kB
   >  Optimizer: Pivotal Optimizer (GPORCA)
   >  Execution Time: 2.259 ms
   > (9 rows)
   > 
   > Time: 35.444 ms
   > ```
   > 
   > While PG planner:
   > 
   > ```sql
   > gpadmin=# explain(analyze) insert into test values(1);
   >                                          QUERY PLAN
   > 
--------------------------------------------------------------------------------------------
   >  Insert on test  (cost=0.00..0.03 rows=0 width=0) (actual 
time=0.082..0.083 rows=0 loops=1)
   >    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.004..0.005 
rows=1 loops=1)
   >  Planning Time: 0.473 ms
   >    (slice0)    Executor memory: 110K bytes (seg1).
   >  Memory used:  128000kB
   >  Optimizer: Postgres query optimizer
   >  Execution Time: 1.308 ms
   > (7 rows)
   > 
   > Time: 4.604 ms
   > ```
   > 
   > A simple INSERT, ORCA takes more time to plan, but finally introduces 
additional 2 RESULT nodes, getting half the results with twice the effort.
   > 
   > **Instead of disabling ORCA, we should focus on addressing these 
inefficiencies directly**. This would not only resolve the immediate issue but 
also improve ORCA’s performance for all users, not just those running "simple" 
queries. Rather than sidestepping the problem, we should tackle the root cause 
head-on. By identifying and fixing the specific inefficiencies in ORCA, you 
could deliver a more robust and user-friendly solution that benefits everyone 
in the long term.
   
   Thank you for thoroughly reviewing my idea!
   
   My goal here is to do some "magic" for database users, not for us as 
database developers. 
   
   For that simple query, you can see that we wasted time mainly on planning.: 
`Planning Time: 29.739 ms` for gporca and `Planning Time: 0.473 ms` for 
postgres query optimizer. As you correctly said for queries like that tuning 
decision should be done by users or applications. 
   
   But some of them do not want to do that. What they want is some kind of 
advice - what they should do in order to improve their performance. They may 
even not know how to see the query execution plan. We constantly teach them, 
write documents and give speeches. However, not all of them are advanced enough 
to choose an optimizer.
   
   So, it would be great if we could help them in some way. Create simple and 
understandable rules to automatically switch between optimizers.
   
   As for other considerations - I totally agree. Instead of disabling ORCA, we 
should focus on improving it. I personally have a bunch of ideas what I want 
try to improve in ORCA. But it needs time and while we are working on it, I 
want an instrument to switch between optimizers. I don't want to enable this by 
default, but I want the option to enable it for specific users.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Do not call gporca for simple queries [cloudberry]

Reply via email to