Hi, > Am 21.05.2017 um 08:06 schrieb Duncan Jones <dun...@wortharead.com>: > > Hi everyone, > > I’ve found some time to continue breaking WordUtils into separate classes > (eschewing the “big collection of static methods” approach). However, as I > read more about case handling in Unicode, I realise how simplistic the > WordUtils methods are and how complex a full solution would need to be. > > Section 5.18 of the Unicode specification [1] describes these complexities. > The mains ones that bother me are: > > 1. Title case conversions vary widely between different locales and > languages. I’m not clear whether any locale is satisfied by the current > simplistic implementation in WordUtils.capitalize(str). Supporting this > correctly would be a serious challenge. > > 2. All types of case conversion may vary depending upon context/locale. There > are examples provided in [1] where the outcome is different in a Turkish > locale or if the letter in question is followed by another or not. > > Does anyone have a suggestion for how to move forward with this work? I see > three options: 1] Admit defeat and avoid the case conversion mess entirely. > 2] Mimic the existing functionality, but document the limitations. 3] Attempt > to deliver a locale-dependent version, perhaps still with limitations (or for > certain languages). > > I’m leaning towards 2, perhaps even calling the classes “SimpleX…”.
Sounds good to me. > > Thanks, > Duncan > > > [1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org