hw118118 commented on PR #1401:
URL: https://github.com/apache/cloudberry/pull/1401#issuecomment-3420385431

   > # Pull Request Review: Fix readable CTE with SELECT INTO clause
   > **Commit:** `4d966f954a` **Author:** hanwei 
[[email protected]](mailto:[email protected]) **Status:** ✅ Approved with 
Recommendations
   > 
   > ## Summary
   > The fix correctly resolves the issue where readable (non-modifying) CTEs 
were incorrectly rejected when used with `SELECT INTO` clauses. The core 
implementation is sound and addresses the root cause effectively.
   > 
   > **What Changed:**
   > 
   > * Modified `src/backend/parser/analyze.c` to distinguish between readable 
and writable CTEs
   > * Added validation that only blocks writable CTEs (INSERT/UPDATE/DELETE) 
with SELECT INTO
   > * Previous behavior incorrectly called `transformWithClause()` 
prematurely, blocking all CTEs
   > 
   > **Impact:**
   > 
   > * ✅ Allows readable CTEs with SELECT INTO (bug fix)
   > * ✅ Still blocks writable CTEs with SELECT INTO (maintains safety)
   > * ✅ More efficient validation (no premature transformation)
   > 
   > ## Testing Results
   > Basic test case passes:
   > 
   > ```sql
   > WITH sel AS (SELECT 1 as a)
   > SELECT a INTO t_r_cte FROM sel;
   > ```
   > 
   > -- ✅ Works correctly
   > 
   > Edge case validation:
   > 
   > * ✅ Multiple readable CTEs with SELECT INTO - works
   > * ✅ Mixed readable/writable CTEs - correctly blocked
   > * ✅ Error messages are clear and actionable
   > 
   > ## Recommendations
   > ### Issue 1
   > 🔴 High Priority
   > 
   > Current test coverage only validates the basic use case. Missing edge 
cases could lead to regressions.
   > 
   > Solution
   > 
   > Add comprehensive test cases to src/test/regress/sql/with.sql:
   > 
   > ```sql
   > --
   > -- Multiple readable CTEs with SELECT INTO
   > --
   > WITH cte1 AS (SELECT 1 as a),
   >      cte2 AS (SELECT 2 as b)
   > SELECT a, b INTO t_multi_r_cte FROM cte1, cte2;
   > SELECT * FROM t_multi_r_cte;
   > DROP TABLE t_multi_r_cte;
   > 
   > --
   > -- Nested SELECT in CTE with SELECT INTO
   > --
   > WITH nested_cte AS (
   >   SELECT * FROM (SELECT 100 as val, 'test'::text as name) subq
   > )
   > SELECT val, name INTO t_nested_r_cte FROM nested_cte;
   > SELECT * FROM t_nested_r_cte;
   > DROP TABLE t_nested_r_cte;
   > 
   > --
   > -- CTE with JOIN and SELECT INTO
   > --
   > WITH cte1 AS (SELECT 1 as id, 'foo' as val),
   >      cte2 AS (SELECT 1 as id, 'bar' as val)
   > SELECT cte1.id, cte1.val as val1, cte2.val as val2
   > INTO t_join_r_cte
   > FROM cte1 JOIN cte2 ON cte1.id = cte2.id;
   > SELECT * FROM t_join_r_cte;
   > DROP TABLE t_join_r_cte;
   > 
   > --
   > -- Verify writable CTE still blocked with SELECT INTO (negative test)
   > --
   > CREATE TABLE t_writeable_test (a int);
   > WITH write_cte AS (
   >   INSERT INTO t_writeable_test VALUES (1) RETURNING *
   > )
   > SELECT a INTO t_should_fail FROM write_cte;  -- should fail
   > DROP TABLE t_writeable_test;
   > 
   > --
   > -- Verify mixed readable and writable CTEs blocked (negative test)
   > --
   > CREATE TABLE t_mixed_test (a int);
   > WITH read_cte AS (SELECT 1 as x),
   >      write_cte AS (INSERT INTO t_mixed_test VALUES (1) RETURNING *)
   > SELECT x INTO t_should_fail2 FROM read_cte;  -- should fail
   > DROP TABLE t_mixed_test;
   > ```
   > 
   > Also update:
   > 
   > * src/test/regress/expected/with.out
   > * src/test/regress/expected/with_optimizer.out
   > 
   > Impact: 🎯 Comprehensive coverage prevents future regressionsEffort: ⏱️ 
~2-3 hours
   > 
   > ### Issue 2
   > 🟡 Medium Priority
   > 
   > The CTE validation logic at analyze.c:346-367 duplicates the type-checking 
pattern, creating maintenance burden.
   > 
   > Solution
   > 
   > Step 1: Add declaration to src/include/parser/parse_cte.h:
   > 
   > ```
   > extern List *transformWithClause(ParseState *pstate, WithClause 
*withClause);
   > extern bool hasModifyingCTE(WithClause *withClause);  // ← ADD THIS
   > extern CommonTableExpr *GetCTEForRTE(ParseState *pstate, RangeTblEntry 
*rte, int rtelevelsup);
   > ```
   > 
   > Step 2: Implement in src/backend/parser/parse_cte.c (before 
transformWithClause):
   > 
   > ```
   > /*
   >  * hasModifyingCTE - check if a WITH clause contains any data-modifying 
CTEs
   >  *
   >  * Returns true if any CTE in the clause is INSERT/UPDATE/DELETE.
   >  * This is used to enforce Cloudberry's restriction that writable CTEs
   >  * cannot be used in contexts that themselves involve writing, due to
   >  * the single writer gang limitation.
   >  *
   >  * Note: This checks the raw parse tree (SelectStmt/InsertStmt/etc), not
   >  * the transformed Query tree.
   >  */
   > bool
   > hasModifyingCTE(WithClause *withClause)
   > {
   >       ListCell   *lc;
   > 
   >       if (withClause == NULL)
   >               return false;
   > 
   >       foreach(lc, withClause->ctes)
   >       {
   >               CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
   > 
   >               if (!IsA(cte->ctequery, SelectStmt))
   >               {
   >                       /* must be a data-modifying statement */
   >                       Assert(IsA(cte->ctequery, InsertStmt) ||
   >                                  IsA(cte->ctequery, UpdateStmt) ||
   >                                  IsA(cte->ctequery, DeleteStmt));
   >                       return true;
   >               }
   >       }
   >       return false;
   > }
   > ```
   > 
   > Then refactor existing code at lines 157-178:
   > 
   > ```
   > foreach(lc, withClause->ctes)
   > {
   >     CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
   > 
   >     if (isCTEModifying(cte))  // ← Use helper
   >     {
   >         /*
   >          * Since GPDB currently only support a single writer gang, only one
   >          * writable clause is permitted per CTE. Once we get flexible gangs
   >          * with more than one writer gang we can lift this restriction.
   >          */
   >         if (pstate->p_hasModifyingCTE)
   >             ereport(ERROR,
   >                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
   >                      errmsg("only one modifying WITH clause allowed per 
query"),
   >                      errdetail("Apache Cloudberry currently only support 
CTEs with one writable clause."),
   >                      errhint("Rewrite the query to only include one 
writable CTE clause.")));
   > 
   >         pstate->p_hasModifyingCTE = true;
   >     }
   > }
   > ```
   > 
   > Step 3: Refactor src/backend/parser/analyze.c (replace lines 344-369):
   > 
   > ```
   > if (stmt->withClause)
   > {
   >       /*
   >        * Since Cloudberry currently only support a single writer gang, 
only one
   >        * writable clause is permitted per CTE. Once we get flexible gangs
   >        * with more than one writer gang we can lift this restriction.
   >        */
   >       if (hasModifyingCTE(stmt->withClause))
   >               ereport(ERROR,
   >                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
   >                                errmsg("writable CTE queries cannot be used 
with writable queries"),
   >                                errdetail("Apache Cloudberry currently only 
support CTEs with one writable clause, called in a non-writable context."),
   >                                errhint("Rewrite the query to only include 
one writable clause.")));
   > }
   > ```
   > 
   > Changes:
   > 
   > * src/include/parser/parse_cte.h (+1 line)
   > * src/backend/parser/parse_cte.c (+25 lines)
   > * src/backend/parser/analyze.c (-20 lines)
   > 
   > Impact: 🔧 Improves maintainability, reduces duplicationEffort: ⏱️ ~1 hour
   > 
   > ### Issue 3
   > 🟢 Low Priority
   > 
   > Commit message contains typo and grammar error:
   > 
   > ```
   > Cloudberry currently only support ne WITH clause per query level.
   >                                  ^^^
   > ```
   > 
   > Correction: before
   > 
   > ```
   > Cloudberry currently only support ne WITH clause per query level.
   > ```
   > 
   > after
   > 
   > ```
   > Cloudberry currently only supports one WITH clause per query level.
   > ```
   > 
   > If not yet merged:
   > 
   > ```
   > git commit --amend
   > # or
   > git rebase -i 4d966f954a^
   > ```
   > 
   > If already merged: Document in release notes
   > 
   > Impact: 📝 Documentation clarityEffort: ⏱️ 5 minutes
   > 
   > Checklist
   > 
   > Before Merge:
   > 
   > * Implement Recommendation issue 1 (expanded tests) - HIGH PRIORITY
   > * Consider Recommendation issue 2 (refactor duplication) - RECOMMENDED
   > * Fix Recommendation issue 3 (commit message) - if possible
   > 
   > Verification:
   > 
   > * Fix solves the reported issue
   > * No regressions introduced
   > * Error messages are appropriate
   > * Basic test coverage included
   > * Comprehensive test coverage (pending)
   > * Code duplication addressed (pending)
   > 
   > Conclusion
   > 
   > Verdict: ✅ Approved with Recommendations
   > 
   > The core fix is excellent and ready for merge. However, implementing the 
test coverage improvements (Recommendation issue 1 is strongly recommended 
before merge to ensure long-term stability. The refactoring (Recommendation 
issue 2 would be a nice-to-have for maintainability.
   > 
   > Great work on identifying and fixing this issue! 🎉
   
   Thank you for your advice, and I have updated it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to