leborchuk commented on PR #900: URL: https://github.com/apache/cloudberry/pull/900#issuecomment-2650110476
> Hi, thanks for your pointing it out, but I have significant concerns about this approach and implications. > > Defining what constitutes a "simple" query is inherently subjective and context-dependent. The current implementation attempts to formalize this, but it feels overly simplistic and overlooks critical cases like subqueries.The choice of optimizer is fundamentally a tuning decision that should remain in the hands of the user or application, not the kernel. > > Additionally, this PR lacks sufficient data-driven justification. Without detailed performance metrics—such as where ORCA is slower for "simple" queries, the magnitude of the performance gap due to the bug below: > > ```sql > gpadmin=# explain(analyze) insert into test values(1); > QUERY PLAN > ------------------------------------------------------------------------------------------------------ > Insert on test (cost=0.00..0.01 rows=1 width=4) (actual time=0.298..0.300 rows=0 loops=1) > -> Result (cost=0.00..0.00 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1) > -> Result (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=1) > -> Result (cost=0.00..0.00 rows=1 width=1) (actual time=0.007..0.008 rows=1 loops=1) > Planning Time: 29.739 ms > (slice0) Executor memory: 111K bytes (seg1). > Memory used: 128000kB > Optimizer: Pivotal Optimizer (GPORCA) > Execution Time: 2.259 ms > (9 rows) > > Time: 35.444 ms > ``` > > While PG planner: > > ```sql > gpadmin=# explain(analyze) insert into test values(1); > QUERY PLAN > -------------------------------------------------------------------------------------------- > Insert on test (cost=0.00..0.03 rows=0 width=0) (actual time=0.082..0.083 rows=0 loops=1) > -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1) > Planning Time: 0.473 ms > (slice0) Executor memory: 110K bytes (seg1). > Memory used: 128000kB > Optimizer: Postgres query optimizer > Execution Time: 1.308 ms > (7 rows) > > Time: 4.604 ms > ``` > > A simple INSERT, ORCA takes more time to plan, but finally introduces additional 2 RESULT nodes, getting half the results with twice the effort. > > **Instead of disabling ORCA, we should focus on addressing these inefficiencies directly**. This would not only resolve the immediate issue but also improve ORCA’s performance for all users, not just those running "simple" queries. Rather than sidestepping the problem, we should tackle the root cause head-on. By identifying and fixing the specific inefficiencies in ORCA, you could deliver a more robust and user-friendly solution that benefits everyone in the long term. Thank you for thoroughly reviewing my idea! My goal here is to do some "magic" for database users, not for us as database developers. For that simple query, you can see that we wasted time mainly on planning.: `Planning Time: 29.739 ms` for gporca and `Planning Time: 0.473 ms` for postgres query optimizer. As you correctly said for queries like that tuning decision should be done by users or applications. But some of them do not want to do that. What they want is some kind of advice - what they should do in order to improve their performance. They may even not know how to see the query execution plan. We constantly teach them, write documents and give speeches. However, not all of them are advanced enough to choose an optimizer. So, it would be great if we could help them in some way. Create simple and understandable rules to automatically switch between optimizers. As for other considerations - I totally agree. Instead of disabling ORCA, we should focus on improving it. I personally have a bunch of ideas what I want try to improve in ORCA. But it needs time and while we are working on it, I want an instrument to switch between optimizers. I don't want to enable this by default, but I want the option to enable it for specific users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
