tokoko opened a new pull request, #12800:
URL: https://github.com/apache/datafusion/pull/12800

   ## Which issue does this PR close?
   This is a necessary prerequisite for #12798
   
   ## Rationale for this change
   
   substrait consumer should make the best effort for one-to-one traslation w/o 
invoking optimizer. Doing otherwise makes round-trip tests too complicated. 
   
   ## What changes are included in this PR?
   
   When translating substrait ReadRel nodes, consumer constructs a dataframe 
first, applies projections with `select` and hopes that subsequent 
`into_optimized_plan` calls with push projections down to `TableScan`. In 
practice, this sometimes adds unnecessary projection nodes to the plan and also 
unnecessary "pushed down" projections to TableScan even when substrait doesn't 
specify any such thing.
   
   This PR:
   - removes `into_optimized_plan` calls from consumer.
   - removes unnecessary projection nodes introduced in 
`ensure_schema_compatability`, instead opting to handcraft projections that are 
put into TableScan only when necessary (when base_schema provided in ReadRel 
has fewer fields than the actual table schema).
   - Also fixes a bug during schema comparison in ensure_schema_compatability 
where comparison never yielded true because qualified and unqualified schemas 
were being compared. This led to consumer adding unnecessary projections to 
TableScan even when schemas match. Had to manually remove these projections 
from test assertions (incl. tpch queries).
   
   ## Are these changes tested?
   
   This is essentially a refactor, being covered by the existing tests. I had 
to alter some asserts in schema compatibility tests as expected assertions were 
less than ideal in the first place (duplicated projections)
   
   ## Are there any user-facing changes?
   
   minor, substrait plans are functionally the same, but may lack unnecessary 
projections.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to