iodone opened a new issue #1832: URL: https://github.com/apache/incubator-kyuubi/issues/1832
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the bug ### Describe Statements with subqueries have the limit condition error pushed down for the final generated logicalplan after setting watchdog.forcedMaxOutputRows. SQL: ``` CREATE TABLE spark_catalog.`default`.tmp_table1(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.`default`.tmp_table1 VALUES (1, 'aa'),(2,'bb'),(3, 'cc'),(4,'aa'),(5,'cc'),(6, 'aa'); select count(*) from tmp_table1 where tmp_table1.key in ( select distinct tmp_table1.key from tmp_table1 where tmp_table1.value = "aa" ); ``` Analyzed LogicPlan: ``` Aggregate [count(1) AS count(1)#62L] +- Filter key#56 IN (list#60 []) : +- GlobalLimit 1 : +- LocalLimit 1 : +- Distinct : +- Project [key#56] : +- Filter (value#57 = aa) : +- SubqueryAlias spark_catalog.default.tmp_table1 : +- Relation[KEY#56,VALUE#57] parquet +- SubqueryAlias spark_catalog.default.tmp_table1 +- Relation[KEY#56,VALUE#57] parquet ``` Limit rules are pushed down into the Filter, which will result in an inaccurate number of final query results. ### Bug Tracing The `extensions.injectPostHocResolutionRule(ForcedMaxOutputRowsRule)` rule is injected in the analyzer phase, and a look at the Batches rule in the analyzer phase reveals that:  SQL with subqueries will first enter the ResolveSubquery Batch during analyze, and the implementation of `ResolvedSubQuery` can be known:  The Analyzer's Execute method will be called again, and the analysis of SQL statements with subqueries will recursively call Analyzer's Batches. Since we added the `ForcedMaxOutputRowsRule`, the subqueries will also be limited with Limit Rule, which eventually leads to the above generated logical plan's semantics being inconsistent with what we expect. ### Solutions 1. Place the `ForcedMaxOutputRowsRule` in the `projectOptimizerRule` stage to avoid recursive subquery calls. 2. Remove `extensions.injectResolutionRule(MarkAggregateOrderRule)`. ### Some questions: I don't see any relevant cases in the unit tests that hit the Aggregate rule without the Limit restriction. And after I removed the markAgg exetension and put the `ForcedMaxOutputRowsRule` in the projectOptimizerRule phase, it passed all the unit tests ### Affects Version(s) master/1.4.0 ### Kyuubi Server Log Output _No response_ ### Kyuubi Engine Log Output _No response_ ### Kyuubi Server Configurations _No response_ ### Kyuubi Engine Configurations _No response_ ### Additional context _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@kyuubi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org