tustvold commented on issue #1799: URL: https://github.com/apache/arrow-rs/issues/1799#issuecomment-1472437840
Having worked through an implementation of this, the additional branching on operations that used to be free, e.g. looking up the datatype or null buffer, causes some quite serious performance regressions. Whilst it is possible to eliminate these, it turns into a game of performance wack-a-mole. Whilst I still think the design as articulated here has some compelling advantages, pragmatically I don't have the time or the inclination to work through every implementation fixing such regressions. Taking a step back, the enumerations are not strictly necessary to achieve the goal of a type-safe, low-level data abstraction that can form a common basis between arrow and arrow2, as articulated [here](https://docs.google.com/presentation/d/1cqQEpC-kJES2Mng152r_qZyaOqHjtb5YFuseSTWyulU/edit#slide=id.p). Instead the modified plan is as follows: * ArrayData remains as is without modification * The ArrayData enumerations and associated trait plumbing are removed * The strongly typed ArrayData structures, (e.g. PrimitiveArrayData, BytesArrayData), are made public * Provide `From` conversions between `ArrayData` and `*ArrayData`, etc... * Provide `From` conversions between `*Array` and `*ArrayData` * Implement `From` conversions between `ArrayData` and `*Array` via `*ArrayData` We can then slowly reduce the usage of `ArrayData` within the codebase, in favor of `*Array` and `*ArrayData`. This achieves the stated goals, without requiring everything to move at the same time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
