alamb commented on issue #11513:
URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2239617843

   Thoughts on the technical details
   
   > This would also mean the introduction of new Field-compatible structure 
(LogicalPhysicalField)
   
   Since `LogicalPlans` already use DFSchema, maybe we could avoid introducing 
a `LogialSchema` at all and instead update `DFSchema` to know how to report 
`LogicalType` 🤔 
   
   > Open question: Should ScalarValue split into a LogicalScalarValue and a 
PhysicalScalarValue? Or should it just provide generic information during 
logical planning (e.g. its logical type, is_null(), is_zero()) without access 
the underlying physical representation?
   
   Another possibility wuld be to migate `ScalarValue` so it was entirely 
logical (the usecase for a Physical scalar value is not large).  We could 
slowly remove the non-logical versions of ScalarValue (e.g. remove 
`ScalarValue::LargeString` and `ScalarValue::Dictionary`) -- this would give us 
some idea of where the boundary would be as well as I think improve the 
usability (we have for example, hit issues where `ScalarValue::StringView` were 
treated differently than `ScalarValue::Utf8`)
   
   I think the biggest challenge of this project will be managing the 
transition from Arrow DataType to LogicalTypes. You have done a great job 
articulating the places that would need to be changed. I think we could manage 
the transition over time by for example deprecating (but leaving UDF APIs in 
terms of DataType)
   
   > ColumnarValue enum could be extended so that functions could choose to 
provide their own optimised implementation for a subset of physical types and 
then fall back to a generic implementation that materialises the argument to 
known physical type.
   
   ANother possibility would be to make a function like 
`ColumnarValue::into_one_of` that ensured (via casting if necessary) the input 
was one of the types supported by the operator. For example
   
   ```rust
   let input: ColumnarValue = &args[0];
   
   // get input as one of the named types, casting if necessary
   let input = input.into_one_of(&[DataType::Utf8View, DataType::Utf8])?;
   
   match input.data_type() {
     DataType::Utf8View => { /*specialized impl for StringViewArray */ },
     DataType::Utf8 =>     { /*specialized impl for StringArray */ },
     _ => unreachable!()
   }  
     ```
   
   > RecordBatches with same logical type but different physical types
   
   Another thing we could do is relax the requirement that the `DataType`s 
matched exactly between physical plans, and instead just require that the 
LogicalTypes matches (in other words an ExecutionPlan could report it generates 
output like `Utf8` but could actually produce output like `Utf8View` 🤔 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to