jorgecarleitao commented on pull request #9329:
URL: https://github.com/apache/arrow/pull/9329#issuecomment-770440474


   > How cheap cloning of the ArrayData struct is probably depends a lot on the 
datatype. The removed indirection seems to be beneficial, but instead we now 
need to clone all fields, which include
   > 
   >     * the datatype enum, which can include boxes/vecs of nested types and 
the field metadata
   > 
   >     * the vec of buffers, and cloning a buffer would include cloning the 
`Arc<Bytes>`
   > 
   > 
   > So for primitive types, removing the arc should be a benefit because of 
less indirection when accessing it, it would still need to allocates vectors 
and clone the Arc which is inside the Buffer.
   > 
   > For more complex types, cloning the DataType enum itself and multiple 
buffers could be slower than the speedup gained by less indirection.
   
   I am sorry, I am not sure I follow the arguments:
   
   * `DataType`: all our operations already clone `DataType`, since they create 
a new `ArrayData`. Even something as simple as slicing an array requires 
cloning a `DataType`, as we need to increase the `offset`, which requires a new 
`ArrayData`, which requires a new `DataType`.
   * `Arc<Bytes>`: I do not see how using `Arc<Bytes>` is relevant to why we 
should use `Arc<ArrayData>`. Regardless, AFAI understand, its usage is really 
important: `Bytes` is an immutable memory region, and sharing it between arrays 
imo makes a lot of sense: the most used case is when the null buffer does not 
change due to an operation. So, the two natural options are `Rc` and an `Arc`, 
and allowing Arrays to be sharable across thread boundaries seems like a good 
justification for `Arc`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to