jorgecarleitao commented on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-740148943


   I went carefully through this. As I understand this PR, the reason we pass 
`DFSchema` into the `ExecutionPlan` is that we need to pass it to 
`PhysicalExpr.evaluate`, so that we can use `field_with_unqualified_name` on 
the `ColumnExpr`. 95% of the changes on the PR are derived from this change.
   
   IMO this introduces complexity on the physical execution that makes it more 
difficult to understand and use.
   
   IMO the signature `evaluate(&BatchRecord, &DFSchema)` indicates a design 
issue, as the recordBatch has all information required to be evaluated by 
`PhysicalExpr`.
   
   IMO we may be able to avoid this complexity by using 
`field_with_unqualified_name` on the physical planner, to create a `Schema` 
that is passed to the `ExecutionPlan` with the fields re-written, and creating 
`ColumnExpr` using the qualifier names.
   
   Specifically, the suggestion is to have the physical planner convert 
`DFSchema -> Schema` by writing `DFField` `(qual, name)` to `Field` 
`"qual.name"`, and, respectively, pass `"qual.name"` to `ColumnExpr`. IMO this 
would allow to keep all physical planning as it is in master, and IMO would 
make it easier to understand the physical execution and how the logical plan is 
being converted to the physical execution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to