ianmcook commented on pull request #10190:
URL: https://github.com/apache/arrow/pull/10190#issuecomment-833589732


   > Side note - I can see use cases for `str_split_n` after 
`split_pattern_regex` is implemented at the C++ level and has R bindings - e.g. 
if there's string columns containing email addresses, or IP addresses, or 
similar, and a specific set of subcomponents are needed. Would it be worth 
requesting the functionality that would enable us to later create `str_split_n`?
   
   When we discussed this earlier, we wondered whether the ability to split by 
a regex (ARROW-12608) would enable us to implement `str_split_n()`. But as I 
think about this more, I do not think it would.
   
   We could implement this if there were a way to take the nth item out of each 
list in an Arrow `ListArray`. If you look at what the C++ string splitting 
compute functions return, it looks like this:
   ```r
   call_function("ascii_split_whitespace", Array$create(c("a b c", "d e f")))
   
   ## ListArray
   ## <list<item: string>>
   ##   [
   ##     [
   ##       "a",
   ##       "b",
   ##       "c"
   ##     ],
   ##     [
   ##       "d",
   ##       "e",
   ##       "f"
   ##     ]
   ##   ]
   ```
   We should see if there is some other function we can call to take the nth 
object from each of those lists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to