andishgar commented on PR #46230:
URL: https://github.com/apache/arrow/pull/46230#issuecomment-2838025640

   > If we're not preallocating buffers which could be preallocated, I think 
it'd be better to correct the other kernels to do so rather than remove an 
optimization from what is probably the most-used kernel
   
   My changes impact 8 kernels. Based on your feedback, I understand I should 
rewrite the entire binary casting to use binary types. Let me explain each 
change in detail.
   
   1- Offset String -> Offset String
   The allocated offset buffer is dropped in zero-copy casting
   
https://github.com/apache/arrow/blob/e6d0edcf24499fdcd7b16a793f1cab4f61754c92/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L308
   If the input and output offset types differ, a new buffer is allocated
   
https://github.com/apache/arrow/blob/e6d0edcf24499fdcd7b16a793f1cab4f61754c92/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L251-L253
   
https://github.com/apache/arrow/blob/e6d0edcf24499fdcd7b16a793f1cab4f61754c92/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L279-L281
   
   2- String View -> Offset String
   The implementation overlooks the allocation of a second buffer
   
https://github.com/apache/arrow/blob/e6d0edcf24499fdcd7b16a793f1cab4f61754c92/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L341
   Additionally, there’s a risk of generating utf8() and binary() types with 
lengths exceeding 2 billion, as discussed 
[here](https://github.com/apache/arrow/issues/46128#issue-299261058).
   
   3- Offset String -> String View
   This requires updates to the compute infrastructure first 
([issue](https://github.com/apache/arrow/issues/46177#issue-3003593798)). 
Additional challenges with slicing and casting are noted 
[here](https://github.com/apache/arrow/issues/46128#issue-2992610582).
   
   4- String View -> String View
   The code works correctly with the current logic in the compute function., 
however if we resolve 
[this](https://github.com/apache/arrow/issues/46177#issuecomment-2814187762), ` 
MemAllocation::NO_PREALLOCATE ` should be used to prevent extra casting
   
   5- Fixed -> String View
   It has the same issue of Offset string-> String View
   
   6- Fixed -> Offset string
   Is it possible to have a separate path for scalar and array?
   
https://github.com/apache/arrow/blob/e6d0edcf24499fdcd7b16a793f1cab4f61754c92/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L597-L612
   
   7-Fixed -> Fixed
   The second buffer is allocated, However, there is no need
   
   8- String| String View -> Fixed
   First : The second buffer is allocated but ignored by the implementation.
   Second:  Why the data buffer is copied for casting from offset string to 
FixedSizeString
   
https://github.com/apache/arrow/blob/e6d0edcf24499fdcd7b16a793f1cab4f61754c92/cpp/src/arrow/compute/kernels/scalar_cast_string.cc#L645-L661
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to