Pardon the obvious but what is missing from methods like
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLowerCase(char)

Gary

On May 21, 2017 5:06 AM, "Duncan Jones" <dun...@wortharead.com> wrote:

> Hi everyone,
>
> I’ve found some time to continue breaking WordUtils into separate classes
> (eschewing the “big collection of static methods” approach). However, as I
> read more about case handling in Unicode, I realise how simplistic the
> WordUtils methods are and how complex a full solution would need to be.
>
> Section 5.18 of the Unicode specification [1] describes these
> complexities. The mains ones that bother me are:
>
> 1. Title case conversions vary widely between different locales and
> languages. I’m not clear whether any locale is satisfied by the current
> simplistic implementation in WordUtils.capitalize(str). Supporting this
> correctly would be a serious challenge.
>
> 2. All types of case conversion may vary depending upon context/locale.
> There are examples provided in [1] where the outcome is different in a
> Turkish locale or if the letter in question is followed by another or not.
>
> Does anyone have a suggestion for how to move forward with this work? I
> see three options: 1] Admit defeat and avoid the case conversion mess
> entirely. 2] Mimic the existing functionality, but document the
> limitations. 3] Attempt to deliver a locale-dependent version, perhaps
> still with limitations (or for certain languages).
>
> I’m leaning towards 2, perhaps even calling the classes “SimpleX…”.
>
> Thanks,
> Duncan
>
>
> [1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to