irenjj commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2894078258
> Thank you everyone for your opinions. Looks like my implementation is trying to wrap everything inside a single optimizor, which is hard to follow and reduces space for collaborative work, and since we can use duckdb's implementation detail, the next steps can be the followings: > > * Define a new `DelimGet` LogicalPlan type and implement existing methods for a standard LogicalPlan. For `DelimJoin` we can reuse existing `LogicalPlan::Join` with a new join_type, because its purpose is only to detect if a join comes from a dependentJoin or not > * Rewrite the Existing Subqueries into 2 operators `DelimJoin` and `DelimGet` (either at planning stage or optimizor stage, duckdb does this at planning stage) > * Decorrelation (duckdb does this at planning stage) > * DelimGet removal Thanks @duongcongtoai ! This is a very good idea! I also think we can start with simple unnest, we may need to introduce some DuckDB structures: new logical plan/expr(`DelimScan`, ...), some new structures (`delim_offset`, `has_correlated_expressions`, ..) By trying to refactor simple unnest, we can discover some limitations of DataFusion and adjust our follow-up plans in a timely manner. cc @alamb @jayzhan211 @suibianwanwank @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org