korowa opened a new issue, #9680: URL: https://github.com/apache/arrow-datafusion/issues/9680
### Describe the bug Query with recursive CTE hangs while trying to collect stream originated from `WorkTableExec` -- this might occur when recursive part contains HashJoin/CrossJoin/NestedLoopJoin, and `WorkTableExec` should be collected in memory as a join build side. ### To Reproduce Reproducer based on [this](https://github.com/apache/arrow-datafusion/blob/4e8ac98fbbebbf965eebba5cc40ecf7c590a6d28/datafusion/sqllogictest/test_files/cte.slt#L354) test query: ``` WITH RECURSIVE "recursive_cte" AS ( SELECT 1 as "val" UNION ALL ( WITH "sub_cte" AS ( SELECT 2 as "val" ) SELECT 2 as "val" FROM "recursive_cte" CROSS JOIN "sub_cte" WHERE "recursive_cte"."val" < 2 ) ) SELECT * FROM "recursive_cte"; ``` Produces following plan, and is unable to complete, constantly executing `CrossJoinExec` ``` physical_plan RecursiveQueryExec: name=recursive_cte, is_distinct=false --ProjectionExec: expr=[1 as val] ----PlaceholderRowExec --ProjectionExec: expr=[2 as val] ----CrossJoinExec ------CoalescePartitionsExec --------CoalesceBatchesExec: target_batch_size=8182 ----------FilterExec: val@0 < 2 ------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 --------------WorkTableExec: name=recursive_cte ------ProjectionExec: expr=[2 as val] --------PlaceholderRowExec ``` In the same time, identical query with swapped inputs works as expected ``` WITH RECURSIVE "recursive_cte" AS ( SELECT 1 as "val" UNION ALL ( WITH "sub_cte" AS ( SELECT 2 as "val" ) SELECT 2 as "val" FROM "sub_cte" CROSS JOIN "recursive_cte" WHERE "recursive_cte"."val" < 2 ) ) SELECT * FROM "recursive_cte"; ``` ### Expected behavior Ideally, recursive CTE should be able to handle `collect()` stream, created by `WorkTableExec`. Another option might be tracking calls to CTE temp table, and throw runtime error in case their number increases some preconfigured threshold value. Third option is to forbid recursive CTE temp tables to be used as build-side join inputs (not preferrable as it looks like too much unnecessary modifications optimizer(s) modification, and not reliable -- same issue might occur in case of aggregations/sorts/maybe some other "blocking" operators). ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
