> On May 22, 2017, at 6:04 AM, sebb <seb...@gmail.com> wrote: > > On 22 May 2017 at 06:56, Duncan Jones <dun...@wortharead.com> wrote: >> >>> On 21 May 2017, at 19:43, Gary Gregory <garydgreg...@gmail.com> wrote: >>> >>> Pardon the obvious but what is missing from methods like >>> https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLowerCase(char) >>> >>> Gary >> >> >> The WordUtils methods turn sentences into title case, which Java’s core >> libraries don’t offer. In fact, the core libraries make doing >> locale-sensitive title case conversions very difficult (see >> http://stackoverflow.com/questions/7360996/unicode-correct-title-case-in-java >> for example). >> >> Doing title casing correctly is quite a subtle art. We don’t even do it >> correctly for English at the moment, which would normally capitalise “The >> Life of Reilly” rather than “The Life Of Reilly”. Other languages have >> completely different conventions or additional complexities. >> > > However the Javadoc does state that the capitalisation is based on > words, not sentences. > So I don't know if there is any expectation that it will take account > of the meaning of the words. > > I guess the question is whether that is useful at all? > If so, we should clarify that the processing takes no account of the > meaning of the words. > If not, we should perhaps drop the methods. > > I think it will be a huge effort to produce anything that works > properly even for US English, let alone UK English. >
I agree here with the level of effort needed to properly capitalize anything in a semantic fashion without some approximation mechanics. The only clear way to do capitalization in a deterministic fashion is simply to rely upon delimiters. I would think that admitting defeat (for commons) isn’t an unreasonable option, with the possibility of putting the bulk of the work in OpenNLP. I would think that would be a better venue for such an algorithm because of the mechanics of language determination being present there and not here. > Names will be a particular problem: ee cummings, D'Ath, O'Toole, MacDonald > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org