Hi, Portuguese uses the SRXSentenceTokenizer, but there's no mapping for Portuguese in segments.srx. The result is that sentences in Portuguese are not detected, i.e. everything is one long sentence. Thus the UppercaseSentenceStartRule never matches except at the very beginning of a text.
Does Portuguese use abbreviations that end in a dot, like a lot of other languages do? If so, is there a list of such abbreviations? We could then add Portuguese to segments.srx to make the sentence splitter work. Regards Daniel ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel