On Tue, Jun 9, 2009 at 7:44 AM, Mark Heckmann <mark.heckm...@gmx.de> wrote:
> Thanks for your help. Your answers solved the problem I posted and that is > just when I noticed that I misspecified the problem ;) > My problem is to separate a German texts by sentences. Unfortunately I > haven't found an R package doing this kind of text separation in German, so > I try it "manually". > > Just using the dot as separator fails in occasions like: > txt <- "One January 1. I saw Rick. He was born in the 19. century." > Sentence boundary disambiguation is a non-trivial problem, as you can see in your above example (cf. "I arrived on January 1. I saw Rick."). You can get ~95% accuracy fairly straightforwardly, but the last 5% are hard. Take a look at http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation, which points to other good resources. -s [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.