alamb commented on issue #10048: URL: https://github.com/apache/datafusion/issues/10048#issuecomment-2541729713
There is some more discussion about this here: https://github.com/apache/datafusion/issues/13659#issuecomment-2541722186 Summarizing: Postgres appears to plan this type of query as a NestedLoops join: ```sql postgres=# create table unnest_test (c1 int[]); CREATE TABLE postgres=# insert into unnest_test values (ARRAY [1,2,3]); INSERT 0 1 postgres=# select * from unnest_test u, unnest(u.c1); c1 | unnest ---------+-------- {1,2,3} | 1 {1,2,3} | 2 {1,2,3} | 3 (3 rows) ``` ```sql Nested Loop (cost=0.00..295.60 rows=13600 width=36) Output: u.c1, unnest.unnest -> Seq Scan on public.unnest_test u (cost=0.00..23.60 rows=1360 width=32) Output: u.c1 -> Function Scan on pg_catalog.unnest (cost=0.00..0.10 rows=10 width=4) Output: unnest.unnest Function Call: unnest(u.c1) ``` 🤔 I think the postgres join operator re-evaluates the right input (`pg_catalog.unnest`) on each input row to the join which is how the lateral joins can refer to the outer query The current DataFusion join operators don't run the input over and over again -- they normally run each input and then combine them (they don't restart inputs) So in a query like ```sql select * from t1, unnest(t1.c1) ``` I think we would need a (new) type of JoinExec that actually does run the outer query (aka `unnest(t1.c1)` for each input row of `t1`) 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org