https://bugs.freedesktop.org/show_bug.cgi?id=42893
Bug #: 42893 Summary: [EDITING] [ProposedEasyHack] Improve Autocorrect: Capitalize first letter of sentence Classification: Unclassified Product: LibreOffice Version: LibO 3.4.4 release Platform: All OS/Version: All Status: UNCONFIRMED Severity: minor Priority: medium Component: Writer AssignedTo: libreoffice-bugs@lists.freedesktop.org ReportedBy: ryan.jendo...@gmail.com In addition to the issue identified in https://bugs.freedesktop.org/show_bug.cgi?id=35515, there are other instances where the Capitalize first letter of every sentence option is more trouble than it's worth. The 'start of sentence detection' should be improved to recognise the following for what they are, and therefore not perform any capitalization: 1. Common contractions, e.g. "esp." for "especially", "incl." for "including", "temp." for "temporary", "e.g.", "i.e.", etc. 2. Things which are clearly acronyms, e.g. "U.S.", "Y.M.C.A.", etc. In regex terms I'd imagine the pattern to be /([a-zA-Z]\.){2,}/, i.e., any two or more occurrences of a letter followed by a period. You could make a judgment call about whether you wanted to limit it to capital letters. On the plus side you're more likely to be looking at something really intended as an acronym, but on the negative site I often use acronyms like "w.r.t." for "with regard to", and suchlike. This might matter more if you thought "e.g." and "i.e." are more accurately classed as acronyms than contractions; I'm not sure the conceptual distinction would make a difference here in practice. 3. Did you spot the 'intentional mistake' in number 1. above? :-) The case where a contraction or acronym falls at the end of a sentence is tricky. Some cursory research ([1],[2]) confirms that in these situations the correct thing to do is to have only one period, which 'does double duty', both indicating the shortening and ending the sentence. Therefore, in these situations LO would probably miss the new sentence and not be able to capitalize. However, both ending sentences with acronyms and (hopefully) the occurrence of people forgetting to capitalize are pretty rare, so I'd vote to suffer this possible intermittent inconvenience in order to have the benefit which 1. and 2. above would bring. As a pie-in-the-sky concept, I guess it'd be possible to do some heuristics using the grammar engine to determine if the writer probably intended to finish the sentence at a certain point, but that seems like a disproportionate amount of effort. Localization issues ------------------- /[a-zA-Z]/ is Unicode-unfriendly for a start. I can't remember if LO's regex engine supports Unicode-aware character entities like [[:alpha:]]: if it does, we can use them; if it doesn't, that's another bug report :p In addition, it's likely that all the rules above would have to be language-contingent. The possible scope of this might be taking us outside the realms of an EasyHack, but it should be possible to lay the groundwork easily enough. [1] http://ethnicity.rutgers.edu/~jlynch/Writing/p.html#periods [2] http://english.stackexchange.com/search?q=[punctuation]+etc -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs