[
https://issues.apache.org/jira/browse/ARROW-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422508#comment-17422508
]
Eduardo Ponce edited comment on ARROW-12714 at 9/30/21, 3:11 AM:
-----------------------------------------------------------------
Adding the following notes for reference purposes.
Converting words into a title form or capitalization is not a trivial task
because of the complexities of natural language (acronyms, always capitalized
words, special cases, non-alpha symbols, etc.). Also, there are no standard
rules across libraries and different behaviors can be observed for certain
inputs. In this PR, we chose to match the rules of Python's `title()` for its
simplicity. Nevertheless, we note that its behavior differs from that of R's
*stringr* library (`str_to_title`) when a word begins with numbers. This was
detected in ARROW-13853.
{code:python}
# R stringr
> str_to_title("1Foo1") # "1foo1"
# Python
>>> "1Foo1".title() # "1Foo1"
{code}
was (Author: edponce):
Adding the following notes for reference purposes.
Converting words into a title form or capitalization is not a trivial task
because of the complexities of natural language (acronyms, always capitalized
words, special cases, non-alpha symbols, etc.). Also, there are no standard
rules across libraries and different behaviors can be observed for certain
inputs. In this PR, we chose to match the rules of Python's `title()` for its
simplicity. Nevertheless, we note that its behavior differs from that of R's
*stringr* library (`str_to_title`) when a word begins with numbers.
{code:python}
# R stringr
> str_to_title("1Foo1") # "1foo1"
# Python
>>> "1Foo1".title() # "1Foo1"
{code}
> [C++] String title case kernel
> ------------------------------
>
> Key: ARROW-12714
> URL: https://issues.apache.org/jira/browse/ARROW-12714
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Ian Cook
> Assignee: Eduardo Ponce
> Priority: Major
> Labels: beginner, pull-request-available
> Fix For: 6.0.0
>
> Time Spent: 4h
> Remaining Estimate: 0h
>
> Capitalizes the first character of each word in the string, like SQL
> {{initcap}} or Python {{str.title()}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)