Jefffrey commented on issue #4844: URL: https://github.com/apache/arrow-datafusion/issues/4844#issuecomment-1374664702
Looks to be regression introduced by fddb3d3651041f41d66a801f10e27387e84374f7 (https://github.com/apache/arrow-datafusion/pull/4562) On the commit prior to it (27921135e4ff4b644251db6ab42f1a25bd6523cb), I get this explain plan: ```sql +------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | initial_logical_plan | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | Filter: g.grade > Int64(2) | | | Filter: s.mark BETWEEN g.min AND g.max | | | CrossJoin: | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | TableScan: grades | | logical_plan after inline_table_scan | SAME TEXT AS ABOVE | | logical_plan after type_coercion | SAME TEXT AS ABOVE | | logical_plan after simplify_expressions | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | Filter: g.grade > Int64(2) | | | Filter: s.mark >= g.min AND s.mark <= g.max | | | CrossJoin: | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | TableScan: grades | ... | logical_plan after eliminate_cross_join | SAME TEXT AS ABOVE | ... | logical_plan after push_down_filter | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | Filter: s.mark >= g.min AND s.mark <= g.max | | | CrossJoin: | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | Filter: grades.grade > Int64(2) | | | TableScan: grades, partial_filters=[grades.grade > Int64(2)] ... ``` And on commit fddb3d3651041f41d66a801f10e27387e84374f7 I get this plan instead: ```sql +------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | initial_logical_plan | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | Filter: g.grade > Int64(2) | | | Inner Join: Filter: s.mark BETWEEN g.min AND g.max | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | TableScan: grades | | logical_plan after inline_table_scan | SAME TEXT AS ABOVE | | logical_plan after type_coercion | SAME TEXT AS ABOVE | | logical_plan after simplify_expressions | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | Filter: g.grade > Int64(2) | | | Inner Join: Filter: s.mark >= g.min AND s.mark <= g.max AS s.mark BETWEEN g.min AND g.max | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | TableScan: grades | ... | logical_plan after eliminate_cross_join | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | Filter: g.grade > Int64(2) | | | CrossJoin: | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | TableScan: grades | ... | logical_plan after push_down_filter | Sort: s.mark DESC NULLS FIRST | | | Projection: s.name, s.mark, g.grade | | | CrossJoin: | | | SubqueryAlias: s | | | TableScan: students | | | SubqueryAlias: g | | | Filter: grades.grade > Int64(2) | | | TableScan: grades, partial_filters=[grades.grade > Int64(2)] ``` The actual regression seems to be caused by the SQL planner generating the initial logical plan with an Inner Join instead of a Cross Join, and this propagates down to cause the bug. However it seems to highlight the actual flaw which is in `eliminate_cross_join` optimizer rule which converts the plan from: ```sql Sort: s.mark DESC NULLS FIRST Projection: s.name, s.mark, g.grade Filter: g.grade > Int64(2) Inner Join: Filter: s.mark >= g.min AND s.mark <= g.max AS s.mark BETWEEN g.min AND g.max SubqueryAlias: s TableScan: students SubqueryAlias: g TableScan: grades ``` to ```sql Sort: s.mark DESC NULLS FIRST Projection: s.name, s.mark, g.grade Filter: g.grade > Int64(2) CrossJoin: SubqueryAlias: s TableScan: students SubqueryAlias: g TableScan: grades ``` Where it completely discards the Filter on the Inner Join when converting it to a Cross Join, causing the buggy behaviour -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
