vbarua opened a new pull request, #13127:
URL: https://github.com/apache/datafusion/pull/13127

   ## Which issue does this PR close?
   Follows up from https://github.com/apache/datafusion/pull/12495
   
   Closes https://github.com/apache/datafusion/issues/12347
   
   ## Rationale for this change
   Substrait relations have an 
[emit_kind](https://github.com/substrait-io/substrait/blob/683f4179a058c2c99c04501b920a48ff372356ff/proto/substrait/algebra.proto#L15-L22)
 which is either Direct, in which case the default fields of the relation are 
output, or Emit, which enables precise control of the order and inclusion of 
fields.
   
   For example, given a relation with the following emit
   ```json
   "emit": {
     "outputMapping": [2, 0, 1]
   }
   ```
   The output mapping indicates that from the default columns output from the 
relation, only the 2nd, 0th and 1st column should be output (in that order).
   
   DataFusion currently ignores the emit_kind field entirely when reading 
Substrait plans.
   
   ## What changes are included in this PR?
   This PR adds support for handling output mappings by treating them as 
DataFusion Projections that are layered on top of the default translation of 
the relation.
   
   The one exception to this is Substrait Project, for which special handling 
has been added to avoid creating a Projection on top of a Projection.
   
   ## Are these changes tested?
   Yes. Two new tests have been added to check the remap logic.
   
   Additionally, DataFusion currently includes output mappings when it produces 
Substrait Projects, so any test which roundtrips a Projection also serves as a 
test of this functionality.
   
   ## Are there any user-facing changes?
   
   Substrait plans generated by DataFusion prior to version 0.42 did not set 
the output mapping correctly for Substrait Projects (see 
https://github.com/apache/datafusion/pull/12495 for details).
   
   After these changes, attempting to consume Substrait plans generated before 
version 0.42 will not work.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to