Hi everyone,

I’ve found some time to continue breaking WordUtils into separate classes 
(eschewing the “big collection of static methods” approach). However, as I read 
more about case handling in Unicode, I realise how simplistic the WordUtils 
methods are and how complex a full solution would need to be.

Section 5.18 of the Unicode specification [1] describes these complexities. The 
mains ones that bother me are:

1. Title case conversions vary widely between different locales and languages. 
I’m not clear whether any locale is satisfied by the current simplistic 
implementation in WordUtils.capitalize(str). Supporting this correctly would be 
a serious challenge.

2. All types of case conversion may vary depending upon context/locale. There 
are examples provided in [1] where the outcome is different in a Turkish locale 
or if the letter in question is followed by another or not.

Does anyone have a suggestion for how to move forward with this work? I see 
three options: 1] Admit defeat and avoid the case conversion mess entirely. 2] 
Mimic the existing functionality, but document the limitations. 3] Attempt to 
deliver a locale-dependent version, perhaps still with limitations (or for 
certain languages).

I’m leaning towards 2, perhaps even calling the classes “SimpleX…”.

Thanks,
Duncan


[1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to