jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2661197895
@tobixdev I found LogicalScalar to be a largely duplicated concept with only minor differences from ScalarValue, mainly aimed at reducing complexity. However, the reduction is minimal—for instance, eliminating try_as_str alone doesn’t seem to justify introducing LogicalScalar. Furthermore, if we separate logical and physical types in LogicalPlan and PhysicalPlan, we would need to implement coercion and casting logic at both layers. Since this isn’t currently the case, adding such complexity doesn’t seem worthwhile. As I previously suggested [here](https://github.com/apache/datafusion/issues/14296#issuecomment-2618344472), it may be a better approach to retain the "physical-type"—or more precisely, the "decoding-type" (arrow::DataType)—within the logical layer. We can still use simplified types like NativeType when beneficial, as we already do in function definitions. > Currently the type model in DataFusion, both in logical and physical plans, relies purely on arrow’s [DataType](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html) enum. But whilst this choice makes sense its physical execution engine (DataTypes map 1:1 with the physical array representation, defining implicit rules for storage, access, and retrieval), it has flaws when dealing with its logical plans. This is due to the fact that some logically equivalent array types in the Arrow format have very different DataTypes – for example a logical string array can be represented in the Arrow array of DataType Utf8, Dictionary(Utf8), RunEndEncoded(Utf8), and StringView (without mentioning the different indexes types that can be specified for dictionaries and REE arrays). To solve the the issue mentioned, having function like `try_as_str` or `NativeType` makes more sense to me now. In summary, I think we should keep `ScalarValue` as it is. And utilize `NativeType` or `TypeClass` (similar to TypeSignatureClass but not for Signature only). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org