On May 21, 2017 10:56 PM, "Duncan Jones" <dun...@wortharead.com> wrote:


> On 21 May 2017, at 19:43, Gary Gregory <garydgreg...@gmail.com> wrote:
>
> Pardon the obvious but what is missing from methods like
> https://docs.oracle.com/javase/7/docs/api/java/lang/
Character.html#isLowerCase(char)
>
> Gary


The WordUtils methods turn sentences into title case, which Java’s core
libraries don’t offer. In fact, the core libraries make doing
locale-sensitive title case conversions very difficult (see
http://stackoverflow.com/questions/7360996/unicode-
correct-title-case-in-java for example).

Doing title casing correctly is quite a subtle art. We don’t even do it
correctly for English at the moment, which would normally capitalise “The
Life of Reilly” rather than “The Life Of Reilly”. Other languages have
completely different conventions or additional complexities.


I see. So the hard part is coming up with the rules.

Aside from that I could see creating an instance of a class
"TitleCaseConverter" or some such with a Locale through a factory method.
The factory can decide whether or not to create a Locale specific subclass.
Maybe there are general rules that could be implemented in the parent class
or even driven of a locale specific properties file... TBD ;-)

Gary



>
> On May 21, 2017 5:06 AM, "Duncan Jones" <dun...@wortharead.com> wrote:
>
>> Hi everyone,
>>
>> I’ve found some time to continue breaking WordUtils into separate classes
>> (eschewing the “big collection of static methods” approach). However, as
I
>> read more about case handling in Unicode, I realise how simplistic the
>> WordUtils methods are and how complex a full solution would need to be.
>>
>> Section 5.18 of the Unicode specification [1] describes these
>> complexities. The mains ones that bother me are:
>>
>> 1. Title case conversions vary widely between different locales and
>> languages. I’m not clear whether any locale is satisfied by the current
>> simplistic implementation in WordUtils.capitalize(str). Supporting this
>> correctly would be a serious challenge.
>>
>> 2. All types of case conversion may vary depending upon context/locale.
>> There are examples provided in [1] where the outcome is different in a
>> Turkish locale or if the letter in question is followed by another or
not.
>>
>> Does anyone have a suggestion for how to move forward with this work? I
>> see three options: 1] Admit defeat and avoid the case conversion mess
>> entirely. 2] Mimic the existing functionality, but document the
>> limitations. 3] Attempt to deliver a locale-dependent version, perhaps
>> still with limitations (or for certain languages).
>>
>> I’m leaning towards 2, perhaps even calling the classes “SimpleX…”.
>>
>> Thanks,
>> Duncan
>>
>>
>> [1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to