korowa opened a new issue, #9680:
URL: https://github.com/apache/arrow-datafusion/issues/9680

   ### Describe the bug
   
   Query with recursive CTE hangs while trying to collect stream originated 
from `WorkTableExec` -- this might occur when recursive part contains 
HashJoin/CrossJoin/NestedLoopJoin, and `WorkTableExec` should be collected in 
memory as a join build side.
   
   ### To Reproduce
   
   Reproducer based on 
[this](https://github.com/apache/arrow-datafusion/blob/4e8ac98fbbebbf965eebba5cc40ecf7c590a6d28/datafusion/sqllogictest/test_files/cte.slt#L354)
 test query:
   
   ```
   WITH RECURSIVE "recursive_cte" AS (
       SELECT 1 as "val"
     UNION ALL (
       WITH "sub_cte" AS (
         SELECT 2 as "val"
       )
       SELECT
         2 as "val"
       FROM "recursive_cte"
         CROSS JOIN "sub_cte"
       WHERE "recursive_cte"."val" < 2
     )
   )
   SELECT * FROM "recursive_cte";
   ```
   
   Produces following plan, and is unable to complete, constantly executing 
`CrossJoinExec`
   
   ```
   physical_plan
   RecursiveQueryExec: name=recursive_cte, is_distinct=false
   --ProjectionExec: expr=[1 as val]
   ----PlaceholderRowExec
   --ProjectionExec: expr=[2 as val]
   ----CrossJoinExec
   ------CoalescePartitionsExec
   --------CoalesceBatchesExec: target_batch_size=8182
   ----------FilterExec: val@0 < 2
   ------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   --------------WorkTableExec: name=recursive_cte
   ------ProjectionExec: expr=[2 as val]
   --------PlaceholderRowExec
   ```
   
   In the same time, identical query with swapped inputs works as expected
   ```
   WITH RECURSIVE "recursive_cte" AS (
       SELECT 1 as "val"
     UNION ALL (
       WITH "sub_cte" AS (
         SELECT 2 as "val"
       )
       SELECT
         2 as "val"
       FROM "sub_cte"
         CROSS JOIN "recursive_cte"
       WHERE "recursive_cte"."val" < 2
     )
   )
   SELECT * FROM "recursive_cte";
   ```
   
   ### Expected behavior
   
   Ideally, recursive CTE should be able to handle `collect()` stream, created 
by `WorkTableExec`.
   
   Another option might be tracking calls to CTE temp table, and throw runtime 
error in case their number increases some preconfigured threshold value.
   
   Third option is to forbid recursive CTE temp tables to be used as build-side 
join inputs (not preferrable as it looks like too much unnecessary 
modifications optimizer(s) modification, and not reliable -- same issue might 
occur in case of aggregations/sorts/maybe some other "blocking" operators).
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to