[GitHub] [arrow-rs] tustvold commented on issue #1799: ArrayData Layout Enumeration

via GitHub Thu, 16 Mar 2023 10:47:27 -0700


tustvold commented on issue #1799:
URL: https://github.com/apache/arrow-rs/issues/1799#issuecomment-1472437840

Having worked through an implementation of this, the additional branching on
operations that used to be free, e.g. looking up the datatype or null buffer,
causes some quite serious performance regressions. Whilst it is possible to
eliminate these, it turns into a game of performance wack-a-mole. Whilst I
still think the design as articulated here has some compelling advantages,
pragmatically I don't have the time or the inclination to work through every
implementation fixing such regressions.

Taking a step back, the enumerations are not strictly necessary to achieve
the goal of a type-safe, low-level data abstraction that can form a common
basis between arrow and arrow2, as articulated
[here](https://docs.google.com/presentation/d/1cqQEpC-kJES2Mng152r_qZyaOqHjtb5YFuseSTWyulU/edit#slide=id.p).

Instead the modified plan is as follows:

* ArrayData remains as is without modification
* The ArrayData enumerations and associated trait plumbing are removed
* The strongly typed ArrayData structures, (e.g. PrimitiveArrayData,
BytesArrayData), are made public
* Provide `From` conversions between `ArrayData` and `*ArrayData`, etc...
* Provide `From` conversions between `*Array` and `*ArrayData`
* Implement `From` conversions between `ArrayData` and `*Array` via
`*ArrayData`

We can then slowly reduce the usage of `ArrayData` within the codebase, in
favor of `*Array` and `*ArrayData`. This achieves the stated goals, without
requiring everything to move at the same time.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on issue #1799: ArrayData Layout Enumeration

Reply via email to