> Hm, a dictionary solution, it think.
Fyi, I am currently working on a dictionary library
(based on Lucene of course) for tokenizing and
stemming the Chinese language. From what has been
mentioned in this thread, it may be useful for the
German language too.
I must warn you that it is not completed, bug-tested
nor full-featured.. I don't really have the time to
work on it for now because I'm busy on another
project. Would anyone like to try adapting it for
German or any other language?
--- Gerhard Schwarz <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Dmitry Serebrennikov wrote:
> > Stemming by itself couldn't solve this problem, it
> seems, because I don't think it is designed for
> splitting compound > words. Yet, this seems like a
> common issue that people would run into constantly.
> So I was wandering:
> > - Do German stemmers typically split compound
> words as well as chooping them down to a root form?
>
> No, not typically, but it can be implemented.
>
> > - Does this processing require dictionary-based
> approaches or are there enough clues in the word
> structure to allow words > to be split
> algorithmically (ala Porter stemmer)?
>
> It's not possible without a dictionary. There are
> some rules how to
> compound some words, but no common rule that is
> valid for all compounds.
> And there are many traps.
>
> > - How is this problem typically solved, in terms
> of smaller search engines and in terms of Yahoos and
> Googles of the
> > German landscape?
>
> Hm, a dictionary solution, it think. Or pattern
> matching.
>
> > Thanks very much for any information to help with
> this!
> > - Dmitry
>
> Greets,
> Gerhard
>
> _______________________________________________
> Lucene-users mailing list
> [EMAIL PROTECTED]
>
https://lists.sourceforge.net/lists/listinfo/lucene-users
>
>
>
__________________________________________________
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/
_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users