Hello Daniel,
Yes, there are lots of words that end with a dot, such as: - vs. - exmo. - etc. - dr. - dra. - prof. and so on... Could you explain what I need to do? PS->I am now the maintainer of the English dictionaries of Apache OpenOffice. I have committed (pressed the "PUBLISH" button in my account) a new version of the dictionaries for AOO (2014.01.01) but it is pending. Every month I release a new version of en_GB with around 200-300 new unique words. It seems I will release Mozilla's more often. Is there a way of LanguageTool update the .DIC and .AFF of the en_GB? But please wait until the 1st of February since it will have 300+ unique words than the last version I released. Thanks! Kind regards, >Marco A.G.Pinto ------------------------ On 20/01/2014 16:45, Daniel Naber wrote: Hi, Portuguese uses the SRXSentenceTokenizer, but there's no mapping for Portuguese in segments.srx. The result is that sentences in Portuguese are not detected, i.e. everything is one long sentence. Thus the UppercaseSentenceStartRule never matches except at the very beginning of a text.Does Portuguese use abbreviations that end in a dot, like a lot of other languages do? If so, is there a list of such abbreviations? We could then add Portuguese to segments.srx to make the sentence splitter work. Regards Daniel --
|
------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel