jorgecarleitao commented on pull request #9454: URL: https://github.com/apache/arrow/pull/9454#issuecomment-778036785
@abreis , I am not convinced that that is sufficient, unfortunately, because it excludes all types that are not Numeric (i.e. all dates and times for primitives, as well as all other logical types). We could of course offer a `DatumTemporal` and a `DatumList` and so one and so forth, but IMO that is introducing more complexity, as we now start having `struct`s for all kinds of logical types. Generally, the logical operation performed on the data depends on its `DataType`. Thus, the scalar needs to contain that logical data type information. In my opinion, the data structure should be something like what DataFusion has, but instead of having one enum variant per logical type, we should have one enum variant per physical type, i.e. ```rust enum ScalarValue { Int32(Option<i32>, DataType), // int32, date32, time32 Int64(Option<i64>, DataType), // int64, date64, time64, timestamp List(Option<Vec<ScalarValue>>, DataType), } ``` The `Option` indicates validity of the type, `DataType` represents the logical type. This corresponds to the notion that each variant needs to be treated fundamentally different because its physical layout (i.e. at the machine level) is different. This is how Rust handles these things with generics, because it relies on type information to compile the instructions. This would allow us to write our numerical operations neatly using a generic over `ArrowNativeType`, as `T::Native` would have a one-to-one correspondence to the physical type in the enum. We still have a problem over which a `PrimitiveArray` currently needs to be downcasted to its logical - not physical - representation, and thus we still have a lot of useless downcastings. Specifically, `PrimitiveArray<Int32Type>` and `PrimitiveArray<Date32Type>` have the same in-memory representation, their only difference is a constant value at `ArrayData::DataType`; there is no reason to have 2 variants here: we just need one variant and two values (the datatypes). [Point 10](https://github.com/jorgecarleitao/arrow2/tree/proposal#10-make-primitivearrayt-nativetype-instead-of-primitivetype) of my proposal addresses this by splitting the logical and physical. I.e. make it `PrimitiveArray<i32>`. Regardless, this is just to explain why imo we should aim at one variant per physical, not logical type. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org