Hi,

> Am 21.05.2017 um 08:06 schrieb Duncan Jones <dun...@wortharead.com>:
> 
> Hi everyone,
> 
> I’ve found some time to continue breaking WordUtils into separate classes 
> (eschewing the “big collection of static methods” approach). However, as I 
> read more about case handling in Unicode, I realise how simplistic the 
> WordUtils methods are and how complex a full solution would need to be.
> 
> Section 5.18 of the Unicode specification [1] describes these complexities. 
> The mains ones that bother me are:
> 
> 1. Title case conversions vary widely between different locales and 
> languages. I’m not clear whether any locale is satisfied by the current 
> simplistic implementation in WordUtils.capitalize(str). Supporting this 
> correctly would be a serious challenge.
> 
> 2. All types of case conversion may vary depending upon context/locale. There 
> are examples provided in [1] where the outcome is different in a Turkish 
> locale or if the letter in question is followed by another or not.
> 
> Does anyone have a suggestion for how to move forward with this work? I see 
> three options: 1] Admit defeat and avoid the case conversion mess entirely. 
> 2] Mimic the existing functionality, but document the limitations. 3] Attempt 
> to deliver a locale-dependent version, perhaps still with limitations (or for 
> certain languages).
> 
> I’m leaning towards 2, perhaps even calling the classes “SimpleX…”.

Sounds good to me.

> 
> Thanks,
> Duncan
> 
> 
> [1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to