alamb opened a new issue, #12031:
URL: https://github.com/apache/datafusion/issues/12031

   ### Is your feature request related to a problem or challenge?
   
   In https://github.com/apache/datafusion/pull/12019/files @dmitrybugakov 
added support for StringViewArray in the `substr` function ❤️ 
   
   However, the initial implementation returns an output `StringArray` when the 
input is a StringViewArray, which means all the strings are copied
   
   In some functions, such as `substr`, this extra copy is unnecessary and only 
the views (aka the i128s that make up the pointers). See 
[GenericByteViewArray](https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html)
 for more details
   
   
   
   ### Describe the solution you'd like
   
   I think we can  avoid the copy when the input uses StringViewArray and thus 
make substr faster
   
   ### Describe alternatives you've considered
   
   The idea would be to
   
   1. Create a benchmark for the substring function for StringArray, 
LargeStringArray and StringViewArray
   2. Optimize the implementation of substr
   
   The optimization would likely look like:
   1.  Change the signature of `substr` so it produces a `StringViewArray` when 
its first argument is a `StringViewArray` (at the moment it produces 
`StringArray` when its argument is a `StringViewArray`)
   2. Make a function that took StringViewArray as input and produced another 
StringViewArray as output
   
   
   
   
   
   ### Additional context
   
   Here is an example benchmark: https://github.com/apache/datafusion/pull/12015
   
   https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to