Portuguese sentence splitter

Daniel Naber Mon, 20 Jan 2014 08:47:34 -0800

Hi,

Portuguese uses the SRXSentenceTokenizer, but there's no mapping for 
Portuguese in segments.srx. The result is that sentences in Portuguese 
are not detected, i.e. everything is one long sentence. Thus the 
UppercaseSentenceStartRule never matches except at the very beginning of 
a text.


Does Portuguese use abbreviations that end in a dot, like a lot of other 
languages do? If so, is there a list of such abbreviations? We could 
then add Portuguese to segments.srx to make the sentence splitter work.

Regards
  Daniel


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Portuguese sentence splitter

Reply via email to