thisisnic commented on pull request #10190:
URL: https://github.com/apache/arrow/pull/10190#issuecomment-832547700


   > Originally I thought this should include translations for the other two 
stringr string splitting functions `str_split_fixed()` and `str_split_n()`, but 
looking at those again, I think we don't want to implement `str_split_fixed()` 
and we can't implement `str_split_n()`. Do you agree @thisisnic? 
https://stringr.tidyverse.org/reference/str_split.html
   
   @ianmcook Yep, when I looked at those, my thoughts were that 
`str_split_fixed()` returns a matrix, which I don't think makes sense with the 
translation between Arrow and R types, and I thought for a while about 
`str_split_n()` and attempted something using offsets but couldn't get it 
working within the function-generating function after realising it can't be 
done with the current suite of Arrow C++ functions.
   
   Side note - I can see use cases for `str_split_n` after 
`split_pattern_regex` is implemented at the C++ level and has R bindings - e.g. 
if there's string columns containing email addresses, or IP addresses, or 
similar, and a specific set of subcomponents are needed.  Would it be worth 
requesting the functionality that would enable us to later create `str_split_n`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to