edespino commented on PR #1401: URL: https://github.com/apache/cloudberry/pull/1401#issuecomment-3414163761
# Pull Request Review: Fix readable CTE with SELECT INTO clause **Commit:** `4d966f954a` **Author:** hanwei <[email protected]> **Status:** ✅ Approved with Recommendations --- ## Summary The fix correctly resolves the issue where readable (non-modifying) CTEs were incorrectly rejected when used with `SELECT INTO` clauses. The core implementation is sound and addresses the root cause effectively. **What Changed:** - Modified `src/backend/parser/analyze.c` to distinguish between readable and writable CTEs - Added validation that only blocks writable CTEs (INSERT/UPDATE/DELETE) with SELECT INTO - Previous behavior incorrectly called `transformWithClause()` prematurely, blocking all CTEs **Impact:** - ✅ Allows readable CTEs with SELECT INTO (bug fix) - ✅ Still blocks writable CTEs with SELECT INTO (maintains safety) - ✅ More efficient validation (no premature transformation) --- ## Testing Results Basic test case passes: ```sql WITH sel AS (SELECT 1 as a) SELECT a INTO t_r_cte FROM sel; ``` -- ✅ Works correctly Edge case validation: - ✅ Multiple readable CTEs with SELECT INTO - works - ✅ Mixed readable/writable CTEs - correctly blocked - ✅ Error messages are clear and actionable --- ## Recommendations ### Issue 1 🔴 High Priority Current test coverage only validates the basic use case. Missing edge cases could lead to regressions. Solution Add comprehensive test cases to src/test/regress/sql/with.sql: ```sql -- -- Multiple readable CTEs with SELECT INTO -- WITH cte1 AS (SELECT 1 as a), cte2 AS (SELECT 2 as b) SELECT a, b INTO t_multi_r_cte FROM cte1, cte2; SELECT * FROM t_multi_r_cte; DROP TABLE t_multi_r_cte; -- -- Nested SELECT in CTE with SELECT INTO -- WITH nested_cte AS ( SELECT * FROM (SELECT 100 as val, 'test'::text as name) subq ) SELECT val, name INTO t_nested_r_cte FROM nested_cte; SELECT * FROM t_nested_r_cte; DROP TABLE t_nested_r_cte; -- -- CTE with JOIN and SELECT INTO -- WITH cte1 AS (SELECT 1 as id, 'foo' as val), cte2 AS (SELECT 1 as id, 'bar' as val) SELECT cte1.id, cte1.val as val1, cte2.val as val2 INTO t_join_r_cte FROM cte1 JOIN cte2 ON cte1.id = cte2.id; SELECT * FROM t_join_r_cte; DROP TABLE t_join_r_cte; -- -- Verify writable CTE still blocked with SELECT INTO (negative test) -- CREATE TABLE t_writeable_test (a int); WITH write_cte AS ( INSERT INTO t_writeable_test VALUES (1) RETURNING * ) SELECT a INTO t_should_fail FROM write_cte; -- should fail DROP TABLE t_writeable_test; -- -- Verify mixed readable and writable CTEs blocked (negative test) -- CREATE TABLE t_mixed_test (a int); WITH read_cte AS (SELECT 1 as x), write_cte AS (INSERT INTO t_mixed_test VALUES (1) RETURNING *) SELECT x INTO t_should_fail2 FROM read_cte; -- should fail DROP TABLE t_mixed_test; ``` Also update: - src/test/regress/expected/with.out - src/test/regress/expected/with_optimizer.out Impact: 🎯 Comprehensive coverage prevents future regressionsEffort: ⏱️ ~2-3 hours ### Issue 2 🟡 Medium Priority The CTE validation logic at analyze.c:346-367 duplicates the type-checking pattern, creating maintenance burden. Solution Step 1: Add declaration to src/include/parser/parse_cte.h: ``` extern List *transformWithClause(ParseState *pstate, WithClause *withClause); extern bool hasModifyingCTE(WithClause *withClause); // ← ADD THIS extern CommonTableExpr *GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup); ``` Step 2: Implement in src/backend/parser/parse_cte.c (before transformWithClause): ``` /* * hasModifyingCTE - check if a WITH clause contains any data-modifying CTEs * * Returns true if any CTE in the clause is INSERT/UPDATE/DELETE. * This is used to enforce Cloudberry's restriction that writable CTEs * cannot be used in contexts that themselves involve writing, due to * the single writer gang limitation. * * Note: This checks the raw parse tree (SelectStmt/InsertStmt/etc), not * the transformed Query tree. */ bool hasModifyingCTE(WithClause *withClause) { ListCell *lc; if (withClause == NULL) return false; foreach(lc, withClause->ctes) { CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc); if (!IsA(cte->ctequery, SelectStmt)) { /* must be a data-modifying statement */ Assert(IsA(cte->ctequery, InsertStmt) || IsA(cte->ctequery, UpdateStmt) || IsA(cte->ctequery, DeleteStmt)); return true; } } return false; } ``` Then refactor existing code at lines 157-178: ``` foreach(lc, withClause->ctes) { CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc); if (isCTEModifying(cte)) // ← Use helper { /* * Since GPDB currently only support a single writer gang, only one * writable clause is permitted per CTE. Once we get flexible gangs * with more than one writer gang we can lift this restriction. */ if (pstate->p_hasModifyingCTE) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only one modifying WITH clause allowed per query"), errdetail("Apache Cloudberry currently only support CTEs with one writable clause."), errhint("Rewrite the query to only include one writable CTE clause."))); pstate->p_hasModifyingCTE = true; } } ``` Step 3: Refactor src/backend/parser/analyze.c (replace lines 344-369): ``` if (stmt->withClause) { /* * Since Cloudberry currently only support a single writer gang, only one * writable clause is permitted per CTE. Once we get flexible gangs * with more than one writer gang we can lift this restriction. */ if (hasModifyingCTE(stmt->withClause)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("writable CTE queries cannot be used with writable queries"), errdetail("Apache Cloudberry currently only support CTEs with one writable clause, called in a non-writable context."), errhint("Rewrite the query to only include one writable clause."))); } ``` Changes: - src/include/parser/parse_cte.h (+1 line) - src/backend/parser/parse_cte.c (+25 lines) - src/backend/parser/analyze.c (-20 lines) Impact: 🔧 Improves maintainability, reduces duplicationEffort: ⏱️ ~1 hour ### Issue 3 🟢 Low Priority Commit message contains typo and grammar error: ``` Cloudberry currently only support ne WITH clause per query level. ^^^ ``` Correction: before ``` Cloudberry currently only support ne WITH clause per query level. ``` after ``` Cloudberry currently only supports one WITH clause per query level. ``` If not yet merged: ``` git commit --amend # or git rebase -i 4d966f954a^ ``` If already merged: Document in release notes Impact: 📝 Documentation clarityEffort: ⏱️ 5 minutes --- Checklist Before Merge: - Implement Recommendation issue 1 (expanded tests) - HIGH PRIORITY - Consider Recommendation issue 2 (refactor duplication) - RECOMMENDED - Fix Recommendation issue 3 (commit message) - if possible Verification: - Fix solves the reported issue - No regressions introduced - Error messages are appropriate - Basic test coverage included - Comprehensive test coverage (pending) - Code duplication addressed (pending) --- Conclusion Verdict: ✅ Approved with Recommendations The core fix is excellent and ready for merge. However, implementing the test coverage improvements (Recommendation issue 1 is strongly recommended before merge to ensure long-term stability. The refactoring (Recommendation issue 2 would be a nice-to-have for maintainability. Great work on identifying and fixing this issue! 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
