tustvold commented on issue #4729:
URL: https://github.com/apache/arrow-rs/issues/4729#issuecomment-1690425995

   I'm really not sure about this for a couple of reasons:
   
   * It would add a huge amount of additional API complexity, as the arrays 
would go from having a well defined layout to having potentially different 
layouts - functions like `PrimitiveArray::values`, `PrimitiveArray::new` would 
need to change, etc...
   * It would add a branch on value access, which will break vectorisation of 
most kernels
   * You would end up with duplicated dictionary logic spread across multiple 
array types
   * It is unclear how one would type the dictionary keys
   * It is unclear how this model would interact with ArrayData
   * This model does not seem to be followed by any arrow implementations 
AFAICT?
   * I'm unclear on the benefits of dictionary encoding primitives, in most 
cases the representation will be larger and slower to process. A 
Dictionary<Int32, Int32Array> can only ever be larger than the corresponding 
Int32Array
   * IMO DataType **IS** the physical type, it has separate physical 
representations for logical types like decimals, intervals, floats, strings, 
etc... Query engines will then typically implement a logical type system on top 
of this, DataFusion included
   
   Perhaps you could expand upon what it is you are trying to achieve here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to