[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

GitBox Sun, 21 Jun 2020 19:07:31 -0700


wesm commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-647226180



   > There is one loose end, the growth of the string can cause a utf8 array to 
be promoted to a large_utf8.
   
   I'd like to treat in-kernel type promotions as an anti-pattern in general. 
If there is the possibility of overflowing the capacity of a StringArray, then 
it would be better to do the type promotion (if that is really what is desired) 
prior to choosing and invoking a kernel (so you would promote to LARGE_STRING 
and then use the large_utf8 kernel variant). 
   
   A better and more efficient strategy would be to break the array into pieces 
with `Slice` (based on some size heuristic, e.g. 1MB-8MB of data per slice at 
most) and process the smaller chunks separately. This also means that you can 
execute the kernel in parallel. This is the decision that will be made by the 
expression execution layer once that is developed (I plan to work on it after 
the 1.0.0  release) because it permits both parallel execution and operator 
pipelining. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

Reply via email to