Re: [PHP-DEV][VOTE][RFC] mb_ucfirst and mb_lcfirst functions

Tim Starling Tue, 06 Feb 2024 19:56:54 -0800

On 7/2/24 13:43, Ayesh Karunaratne wrote:

Hi Tim,
Now that the RFC is restarted, could you mention some examples inGeorgian that might be good test cases?
I was thinking there might be some good test cases in Turkish, butcouldn't find any. The RFC has examples(https://github.com/php/php-src/pull/13161) in Vietnamese, but theyare correct for both "uppercase first character" and titlecaseconversions.

Any Georgian word would do. Your ASCII test case is "abc". TheGeorgian equivalent for that would be "აბგ" (ani bani gani, U+10D0U+10D1 U+10D2) which should remain the same after passing throughmb_ucfirst(). Compare mb_strtoupper("აბგ") -> "ᲐᲑᲒ" (U+1C90 U+1C91U+1C92).

On the task I mentioned that ligatures are also affected. I gave theexample mb_ucfirst("ǉ") -> "ǈ", that is, U+01C9 -> U+01C8. You couldadd a test case for that. Compare mb_strtoupper("ǉ") -> "Ǉ" (U+01C7).

To repeat my rationale -- we can view ucfirst() either through atechnical lens (convert the first character of a string to upper case)or through a natural language lens (convert a string to sentence case,with the initial letter capitalised per local conventions). I amarguing to make mb_ucfirst() be a natural language extension ofucfirst(), because applying the technical extension would produceresults that look quite jarring in a natural language context.

There are some edge cases which are not quite right. To really do agood job, a new case map will be needed. But if we document it asbeing for natural language, and set the right expectations, we can fixthe edge cases later.


-- Tim Starling

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV][VOTE][RFC] mb_ucfirst and mb_lcfirst functions

Reply via email to