[GitHub] [arrow] andygrove commented on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

GitBox Sun, 06 Dec 2020 08:43:25 -0800


andygrove commented on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-739528704



   @jorgecarleitao @alamb I've been looking at the question of whether the 
physical plan should use DFSchema. Here is the current (in master) 
implementation of the physical expression for Column:
   
   ```rust
   impl PhysicalExpr for Column {
       /// Get the data type of this expression, given the schema of the input
       fn data_type(&self, input_schema: &Schema) -> Result<DataType> {
           Ok(input_schema
               .field_with_name(&self.name)?
               .data_type()
               .clone())
       }
   
       /// Decide whehter this expression is nullable, given the schema of the 
input
       fn nullable(&self, input_schema: &Schema) -> Result<bool> {
           Ok(input_schema.field_with_name(&self.name)?.is_nullable())
       }
   
       /// Evaluate the expression
       fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
           Ok(ColumnarValue::Array(
               batch.column(batch.schema().index_of(&self.name)?).clone(),
           ))
       }
   }
   ```
   
   As you can see, the `data_type` and `nullable` use the schema from the plan 
whereas the `evaluate` method uses the schema from the record batch, which is a 
little inconsistent. They should probably all use the same schema.
   
   The bigger issue though is that this expression is looking up columns by 
name, so how do we support qualified names here? I see the following choices:
   
   1. Have `ExecutionPlan.schema()` use `DFSchema` as I have done in this PR
   2. Use qualified names in the Arrow schema field name e.g. "t1.foo"
   3. Change the Column physical expression to refer to columns by index rather 
than name
   
   Maybe there are other options that I am not seeing?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] andygrove commented on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

Reply via email to