tustvold commented on issue #1799:
URL: https://github.com/apache/arrow-rs/issues/1799#issuecomment-1472437840

   Having worked through an implementation of this, the additional branching on 
operations that used to be free, e.g. looking up the datatype or null buffer, 
causes some quite serious performance regressions. Whilst it is possible to 
eliminate these, it turns into a game of performance wack-a-mole. Whilst I 
still think the design as articulated here has some compelling advantages, 
pragmatically I don't have the time or the inclination to work through every 
implementation fixing such regressions.
   
   Taking a step back, the enumerations are not strictly necessary to achieve 
the goal of a type-safe, low-level data abstraction that can form a common 
basis between arrow and arrow2, as articulated 
[here](https://docs.google.com/presentation/d/1cqQEpC-kJES2Mng152r_qZyaOqHjtb5YFuseSTWyulU/edit#slide=id.p).
   
   Instead the modified plan is as follows:
   
   * ArrayData remains as is without modification
   * The ArrayData enumerations and associated trait plumbing are removed
   * The strongly typed ArrayData structures, (e.g. PrimitiveArrayData, 
BytesArrayData), are made public 
   * Provide `From` conversions between `ArrayData` and `*ArrayData`, etc...
   * Provide `From` conversions between `*Array` and `*ArrayData`
   * Implement `From` conversions between `ArrayData` and `*Array` via 
`*ArrayData`
   
   We can then slowly reduce the usage of `ArrayData` within the codebase, in 
favor of `*Array` and `*ArrayData`. This achieves the stated goals, without 
requiring everything to move at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to