Re: [PR] Fix readable CTE with SELECT INTO clause. [cloudberry]

via GitHub Sat, 18 Oct 2025 09:02:41 -0700


edespino commented on PR #1401:
URL: https://github.com/apache/cloudberry/pull/1401#issuecomment-3414163761


   # Pull Request Review: Fix readable CTE with SELECT INTO clause
   
   **Commit:** `4d966f954a`
   **Author:** hanwei <[email protected]>
   **Status:** ✅ Approved with Recommendations
   
   ---
   
   ## Summary
   
   The fix correctly resolves the issue where readable (non-modifying) CTEs 
were incorrectly rejected when used with `SELECT INTO` clauses. The core 
implementation
   is sound and addresses the root cause effectively.
   
   **What Changed:**
   - Modified `src/backend/parser/analyze.c` to distinguish between readable 
and writable CTEs
   - Added validation that only blocks writable CTEs (INSERT/UPDATE/DELETE) 
with SELECT INTO
   - Previous behavior incorrectly called `transformWithClause()` prematurely, 
blocking all CTEs
   
   **Impact:**
   - ✅ Allows readable CTEs with SELECT INTO (bug fix)
   - ✅ Still blocks writable CTEs with SELECT INTO (maintains safety)
   - ✅ More efficient validation (no premature transformation)
   
   ---
   
   ## Testing Results
   
   Basic test case passes:
   ```sql
   WITH sel AS (SELECT 1 as a)
   SELECT a INTO t_r_cte FROM sel;
   ```
   -- ✅ Works correctly
   
   Edge case validation:
   - ✅ Multiple readable CTEs with SELECT INTO - works
   - ✅ Mixed readable/writable CTEs - correctly blocked
   - ✅ Error messages are clear and actionable
   
   ---
   ## Recommendations
   
   ### Issue 1
   🔴 High Priority
   
   Current test coverage only validates the basic use case. Missing edge cases 
could lead to regressions.
   
   Solution
   
   Add comprehensive test cases to src/test/regress/sql/with.sql:
   
   ```sql
   --
   -- Multiple readable CTEs with SELECT INTO
   --
   WITH cte1 AS (SELECT 1 as a),
        cte2 AS (SELECT 2 as b)
   SELECT a, b INTO t_multi_r_cte FROM cte1, cte2;
   SELECT * FROM t_multi_r_cte;
   DROP TABLE t_multi_r_cte;
   
   --
   -- Nested SELECT in CTE with SELECT INTO
   --
   WITH nested_cte AS (
     SELECT * FROM (SELECT 100 as val, 'test'::text as name) subq
   )
   SELECT val, name INTO t_nested_r_cte FROM nested_cte;
   SELECT * FROM t_nested_r_cte;
   DROP TABLE t_nested_r_cte;
   
   --
   -- CTE with JOIN and SELECT INTO
   --
   WITH cte1 AS (SELECT 1 as id, 'foo' as val),
        cte2 AS (SELECT 1 as id, 'bar' as val)
   SELECT cte1.id, cte1.val as val1, cte2.val as val2
   INTO t_join_r_cte
   FROM cte1 JOIN cte2 ON cte1.id = cte2.id;
   SELECT * FROM t_join_r_cte;
   DROP TABLE t_join_r_cte;
   
   --
   -- Verify writable CTE still blocked with SELECT INTO (negative test)
   --
   CREATE TABLE t_writeable_test (a int);
   WITH write_cte AS (
     INSERT INTO t_writeable_test VALUES (1) RETURNING *
   )
   SELECT a INTO t_should_fail FROM write_cte;  -- should fail
   DROP TABLE t_writeable_test;
   
   --
   -- Verify mixed readable and writable CTEs blocked (negative test)
   --
   CREATE TABLE t_mixed_test (a int);
   WITH read_cte AS (SELECT 1 as x),
        write_cte AS (INSERT INTO t_mixed_test VALUES (1) RETURNING *)
   SELECT x INTO t_should_fail2 FROM read_cte;  -- should fail
   DROP TABLE t_mixed_test;
   ```
   
   Also update:
   - src/test/regress/expected/with.out
   - src/test/regress/expected/with_optimizer.out
   
   Impact: 🎯 Comprehensive coverage prevents future regressionsEffort: ⏱️ ~2-3 
hours
   
   ### Issue 2
   🟡 Medium Priority
   
   The CTE validation logic at analyze.c:346-367 duplicates the type-checking 
pattern, creating maintenance burden.
   
   Solution
   
   Step 1: Add declaration to src/include/parser/parse_cte.h:
   
   ```
   extern List *transformWithClause(ParseState *pstate, WithClause *withClause);
   extern bool hasModifyingCTE(WithClause *withClause);  // ← ADD THIS
   extern CommonTableExpr *GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, 
int rtelevelsup);
   ```
   
   Step 2: Implement in src/backend/parser/parse_cte.c (before 
transformWithClause):
   ```
   /*
    * hasModifyingCTE - check if a WITH clause contains any data-modifying CTEs
    *
    * Returns true if any CTE in the clause is INSERT/UPDATE/DELETE.
    * This is used to enforce Cloudberry's restriction that writable CTEs
    * cannot be used in contexts that themselves involve writing, due to
    * the single writer gang limitation.
    *
    * Note: This checks the raw parse tree (SelectStmt/InsertStmt/etc), not
    * the transformed Query tree.
    */
   bool
   hasModifyingCTE(WithClause *withClause)
   {
         ListCell   *lc;
   
         if (withClause == NULL)
                 return false;
   
         foreach(lc, withClause->ctes)
         {
                 CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
   
                 if (!IsA(cte->ctequery, SelectStmt))
                 {
                         /* must be a data-modifying statement */
                         Assert(IsA(cte->ctequery, InsertStmt) ||
                                    IsA(cte->ctequery, UpdateStmt) ||
                                    IsA(cte->ctequery, DeleteStmt));
                         return true;
                 }
         }
         return false;
   }
   ```
   
   Then refactor existing code at lines 157-178:
   
   ```
   foreach(lc, withClause->ctes)
   {
       CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
   
       if (isCTEModifying(cte))  // ← Use helper
       {
           /*
            * Since GPDB currently only support a single writer gang, only one
            * writable clause is permitted per CTE. Once we get flexible gangs
            * with more than one writer gang we can lift this restriction.
            */
           if (pstate->p_hasModifyingCTE)
               ereport(ERROR,
                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                        errmsg("only one modifying WITH clause allowed per 
query"),
                        errdetail("Apache Cloudberry currently only support 
CTEs with one writable clause."),
                        errhint("Rewrite the query to only include one writable 
CTE clause.")));
   
           pstate->p_hasModifyingCTE = true;
       }
   }
   ```
   
   Step 3: Refactor src/backend/parser/analyze.c (replace lines 344-369):
   
   ```
   if (stmt->withClause)
   {
         /*
          * Since Cloudberry currently only support a single writer gang, only 
one
          * writable clause is permitted per CTE. Once we get flexible gangs
          * with more than one writer gang we can lift this restriction.
          */
         if (hasModifyingCTE(stmt->withClause))
                 ereport(ERROR,
                                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                                  errmsg("writable CTE queries cannot be used 
with writable queries"),
                                  errdetail("Apache Cloudberry currently only 
support CTEs with one writable clause, called in a non-writable context."),
                                  errhint("Rewrite the query to only include 
one writable clause.")));
   }
   ```
   
   Changes:
   - src/include/parser/parse_cte.h (+1 line)
   - src/backend/parser/parse_cte.c (+25 lines)
   - src/backend/parser/analyze.c (-20 lines)
   
   Impact: 🔧 Improves maintainability, reduces duplicationEffort: ⏱️ ~1 hour
   
   ### Issue 3
   🟢 Low Priority
   
   Commit message contains typo and grammar error:
   ```
   Cloudberry currently only support ne WITH clause per query level.
                                    ^^^
   ```
   
   Correction:
   before
   ```
   Cloudberry currently only support ne WITH clause per query level.
   ```
   
   after
   ```
   Cloudberry currently only supports one WITH clause per query level.
   ```
   
   If not yet merged:
   ```
   git commit --amend
   # or
   git rebase -i 4d966f954a^
   ```
   
   If already merged: Document in release notes
   
   Impact: 📝 Documentation clarityEffort: ⏱️ 5 minutes
   
   ---
   Checklist
   
   Before Merge:
   - Implement Recommendation issue 1 (expanded tests) - HIGH PRIORITY
   - Consider Recommendation issue 2 (refactor duplication) - RECOMMENDED
   - Fix Recommendation issue 3 (commit message) - if possible
   
   Verification:
   - Fix solves the reported issue
   - No regressions introduced
   - Error messages are appropriate
   - Basic test coverage included
   - Comprehensive test coverage (pending)
   - Code duplication addressed (pending)
   
   ---
   Conclusion
   
   Verdict: ✅ Approved with Recommendations
   
   The core fix is excellent and ready for merge. However, implementing the 
test coverage improvements (Recommendation issue 1 is strongly recommended 
before merge to ensure long-term stability. The refactoring (Recommendation 
issue 2 would be a nice-to-have for maintainability.
   
   Great work on identifying and fixing this issue! 🎉
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix readable CTE with SELECT INTO clause. [cloudberry]

Reply via email to