jorgecarleitao commented on pull request #8839: URL: https://github.com/apache/arrow/pull/8839#issuecomment-740148943
I went carefully through this. As I understand this PR, the reason we pass `DFSchema` into the `ExecutionPlan` is that we need to pass it to `PhysicalExpr.evaluate`, so that we can use `field_with_unqualified_name` on the `ColumnExpr`. 95% of the changes on the PR are derived from this change. IMO this introduces complexity on the physical execution that makes it more difficult to understand and use. IMO the signature `evaluate(&BatchRecord, &DFSchema)` indicates a design issue, as the recordBatch has all information required to be evaluated by `PhysicalExpr`. IMO we may be able to avoid this complexity by using `field_with_unqualified_name` on the physical planner, to create a `Schema` that is passed to the `ExecutionPlan` with the fields re-written, and creating `ColumnExpr` using the qualifier names. Specifically, the suggestion is to have the physical planner convert `DFSchema -> Schema` by writing `DFField` `(qual, name)` to `Field` `"qual.name"`, and, respectively, pass `"qual.name"` to `ColumnExpr`. IMO this would allow to keep all physical planning as it is in master, and IMO would make it easier to understand the physical execution and how the logical plan is being converted to the physical execution. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
