wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647226180
> There is one loose end, the growth of the string can cause a utf8 array to be promoted to a large_utf8. I'd like to treat in-kernel type promotions as an anti-pattern in general. If there is the possibility of overflowing the capacity of a StringArray, then it would be better to do the type promotion (if that is really what is desired) prior to choosing and invoking a kernel (so you would promote to LARGE_STRING and then use the large_utf8 kernel variant). A better and more efficient strategy would be to break the array into pieces with `Slice` (based on some size heuristic, e.g. 1MB-8MB of data per slice at most) and process the smaller chunks separately. This also means that you can execute the kernel in parallel. This is the decision that will be made by the expression execution layer once that is developed (I plan to work on it after the 1.0.0 release) because it permits both parallel execution and operator pipelining. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org