tokoko opened a new pull request, #12800: URL: https://github.com/apache/datafusion/pull/12800
## Which issue does this PR close? This is a necessary prerequisite for #12798 ## Rationale for this change substrait consumer should make the best effort for one-to-one traslation w/o invoking optimizer. Doing otherwise makes round-trip tests too complicated. ## What changes are included in this PR? When translating substrait ReadRel nodes, consumer constructs a dataframe first, applies projections with `select` and hopes that subsequent `into_optimized_plan` calls with push projections down to `TableScan`. In practice, this sometimes adds unnecessary projection nodes to the plan and also unnecessary "pushed down" projections to TableScan even when substrait doesn't specify any such thing. This PR: - removes `into_optimized_plan` calls from consumer. - removes unnecessary projection nodes introduced in `ensure_schema_compatability`, instead opting to handcraft projections that are put into TableScan only when necessary (when base_schema provided in ReadRel has fewer fields than the actual table schema). - Also fixes a bug during schema comparison in ensure_schema_compatability where comparison never yielded true because qualified and unqualified schemas were being compared. This led to consumer adding unnecessary projections to TableScan even when schemas match. Had to manually remove these projections from test assertions (incl. tpch queries). ## Are these changes tested? This is essentially a refactor, being covered by the existing tests. I had to alter some asserts in schema compatibility tests as expected assertions were less than ideal in the first place (duplicated projections) ## Are there any user-facing changes? minor, substrait plans are functionally the same, but may lack unnecessary projections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
