On Tue, Jun 9, 2009 at 7:44 AM, Mark Heckmann <mark.heckm...@gmx.de> wrote:

> Thanks for your help. Your answers solved the problem I posted and that is
> just when I noticed that I misspecified the problem ;)
> My problem is to separate a German texts by sentences. Unfortunately I
> haven't found an R package doing this kind of text separation in German, so
> I try it "manually".
>
> Just using the dot as separator fails in occasions like:
> txt <- "One January 1. I saw Rick. He was born in the 19. century."
>

Sentence boundary disambiguation is a non-trivial problem, as you can see in
your above example (cf. "I arrived on January 1. I saw Rick.").  You can get
~95% accuracy fairly straightforwardly, but the last 5% are hard.  Take a
look at http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation, which
points to other good resources.

           -s

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to