[
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dragoș Moldovan-Grünfeld updated ARROW-13615:
---------------------------------------------
Parent: ARROW-14865
Issue Type: Sub-task (was: Improvement)
> [R] Bindings for stringr::str_to_sentence
> -----------------------------------------
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: R
> Reporter: Nicola Crane
> Assignee: Dragoș Moldovan-Grünfeld
> Priority: Major
> Labels: good-first-issue, kernel, pull-request-available
> Fix For: 7.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> There is more to this issue than meets the eye. The
> {{stringr::str_to_sentence()}} does 2 things:
> * capitalise the first word
> * if there are multiple sentences provided as a single string, attempts to
> find sentence breaks and capitalise the first word of each sentence.
> The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which
> in turns uses ICU’s BreakIterator to locate specific text boundaries. As a
> consequence {{stringr::str_to_title()}} is not able to identify a full stop /
> period (".") as a sentence end and does not capitalise words following it.
> Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}}
> kernel (which capitalises the first word of a string without making any
> attempt to break into sentences) and the behaviour of
> {{stringr::str_to_sentence()}}.
> For more extensive discussions around the {{stringi / stringr}}
> implementation see {{stringr}} issues
> [202|https://github.com/tidyverse/stringr/issues/202] and
> [231|https://github.com/tidyverse/stringr/issues/231].
> Due to the complexity of this issue and the relatively niche use cases, the
> recommendation is to postpone implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)