HaoYang670 commented on issue #1799:
URL: https://github.com/apache/arrow-rs/issues/1799#issuecomment-1147434811

   The reason that I prefer removing `ArrayData::data_type` is that it 
introduces the possibility of the inconsistency between `ArrayData::data_type` 
and `ArrayData::layout`.  And this could increase the workload of 
`ArrayData::validate` (lots of pattern matching ...).
   
   > You still need the DataType to roundtrip the actual type, e.g. int32 vs 
uint32, the Field for nested types, etc...
   
   The first way I thought is that we could inject `dataType` into 
`ArrayDataLayout`. For example:
   ```rust
   pub enum ArrayDataLayout {
     ...
     Primitive(type: PrimitiveType, values: Buffer },
     Binary (is_large: Boolean, values: Buffer ...},
     ...
   }
   
   pub enum PrimitiveType {
       Int32,
       Int64, 
       ...
   }
   ```
   But this cannot support nested types well. 
   
   My second thought is that we could refactor `DataType` like this:
   ```rust
   enum DataType {
       Primitive(type: PrimitiveType)
       List(type: ListType)
       ...
   }
   
   enum PrimitiveType {
       Int32,
       Int64,
       ...
   }
   
   enum ListType {
       List(Box<Field>),
       FixedSizeList(Box<Field>, i32),
       LargeList(Box<Field>),
   }
   ```
   I guess this could decrease the workload of `ArrayData::validate`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to