nuno-faria opened a new issue, #22249:
URL: https://github.com/apache/datafusion/issues/22249

   ### Describe the bug
   
   The logical optimizer rule `optimize_projections` can cause recursive CTEs 
to fail or return incorrect results. This is caused by attempting to remove 
projections based on what the outer query returns independently of what the 
recursive term uses.
   
   This is introduced by PR https://github.com/apache/datafusion/pull/16696 to 
solve https://github.com/apache/datafusion/issues/16684. I think the best 
approach to solve this bug while keeping #16684 fixed is to optimize the 
projections of the static term as if it was a regular query, and not attempt to 
optimize based on the outer query. As far as I can tell PostgreSQL and DuckDB 
work like this. As a bonus, I think it would solve this issue as well: 
https://github.com/apache/datafusion/issues/17853.
   
   If anyone has a better suggestion please let me know.
   
   
   ### To Reproduce
   
   For example, this query should return two rows:
   ```sql
   with recursive t(k, v) as (                                
        select 1 k, 10 v                          
        union all                                     
        select 2, 20 from t where k = 1                                         
   
   )              
   select v from t;
   +----+
   | v  |
   +----+
   | 10 |
   +----+
   
   -- k is removed from the static term, but it is used in the recursive term
   SubqueryAlias: t
     Projection: v AS v
       RecursiveQuery: is_distinct=false
         Projection: Int64(10) AS v
           EmptyRelation: rows=1
         Projection: Int64(20)
           Filter: t.k = Int64(1)
             TableScan: t projection=[k]
   ```
   
   This query fails:
   ```sql
   with recursive t(k, v) as (                                
        select 1 k, 10 v                          
        union all                                     
        select 2, 20 from t where v = 10                                        
    
   )              
   select v from t;
   Arrow error: Schema error: project index 1 out of bounds, max field 1
   ```
   
   ### Expected behavior
   
   Both queries should return 2 rows.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to