Exactly. Without the '.' they became ambiguous. I don't know if it make sense to add abbreviations that don't need EOS characters to the dictionary we use at Sentence Detector. We are trying to solve the EOS ambiguity anyway, and if there is no EOS character there is no ambiguity.
On Mon, Mar 19, 2012 at 6:11 PM, Jörn Kottmann <[email protected]> wrote: > On 03/19/2012 09:55 PM, [email protected] wrote: > >> I don't know if it is conclusive, but with the changes (case insensitive, >> remove non word chars) the sentence detector performed worse at least for >> my Portuguese corpus. >> > > Maybe it is matching now at places where it should not match (and did not > match before) ? > > Jörn >
