HaoYang670 commented on issue #1799: URL: https://github.com/apache/arrow-rs/issues/1799#issuecomment-1147434811
The reason that I prefer removing `ArrayData::data_type` is that it introduces the possibility of the inconsistency between `ArrayData::data_type` and `ArrayData::layout`. And this could increase the workload of `ArrayData::validate` (lots of pattern matching ...). > You still need the DataType to roundtrip the actual type, e.g. int32 vs uint32, the Field for nested types, etc... The first way I thought is that we could inject `dataType` into `ArrayDataLayout`. For example: ```rust pub enum ArrayDataLayout { ... Primitive(type: PrimitiveType, values: Buffer }, Binary (is_large: Boolean, values: Buffer ...}, ... } pub enum PrimitiveType { Int32, Int64, ... } ``` But this cannot support nested types well. My second thought is that we could refactor `DataType` like this: ```rust enum DataType { Primitive(type: PrimitiveType) List(type: ListType) ... } enum PrimitiveType { Int32, Int64, ... } enum ListType { List(Box<Field>), FixedSizeList(Box<Field>, i32), LargeList(Box<Field>), } ``` I guess this could decrease the workload of `ArrayData::validate`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org