andygrove commented on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-739528704
@jorgecarleitao @alamb I've been looking at the question of whether the
physical plan should use DFSchema. Here is the current (in master)
implementation of the physical expression for Column:
```rust
impl PhysicalExpr for Column {
/// Get the data type of this expression, given the schema of the input
fn data_type(&self, input_schema: &Schema) -> Result<DataType> {
Ok(input_schema
.field_with_name(&self.name)?
.data_type()
.clone())
}
/// Decide whehter this expression is nullable, given the schema of the
input
fn nullable(&self, input_schema: &Schema) -> Result<bool> {
Ok(input_schema.field_with_name(&self.name)?.is_nullable())
}
/// Evaluate the expression
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
Ok(ColumnarValue::Array(
batch.column(batch.schema().index_of(&self.name)?).clone(),
))
}
}
```
As you can see, the `data_type` and `nullable` use the schema from the plan
whereas the `evaluate` method uses the schema from the record batch, which is a
little inconsistent. They should probably all use the same schema.
The bigger issue though is that this expression is looking up columns by
name, so how do we support qualified names here? I see the following choices:
1. Have `ExecutionPlan.schema()` use `DFSchema` as I have done in this PR
2. Use qualified names in the Arrow schema field name e.g. "t1.foo"
3. Change the Column physical expression to refer to columns by index rather
than name
Maybe there are other options that I am not seeing?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]