Greetings, everyone. We have been using Lucene for English texts for some time and that works really well. But I recently spoke with someone from Germany and they had raised an issue with that language that I wasn't sure how Lucene would be able to tackle. The example they used was with a word "black pen", which in German is apparently a single word. The application domain is an e-commerce catalog. So if the catalog uses this composit word but a person is looking for any "pen", they will likely find nothing. Similarly, if the catalog specifies "pen" and separately its color, but the user uses the compound word, they will again find nothing. Stemming by itself couldn't solve this problem, it seems, because I don't think it is designed for splitting compound words. Yet, this seems like a common issue that people would run into constantly. So I was wandering: - Do German stemmers typically split compound words as well as chooping them down to a root form? - Does this processing require dictionary-based approaches or are there enough clues in the word structure to allow words to be split algorithmically (ala Porter stemmer)? - How is this problem typically solved, in terms of smaller search engines and in terms of Yahoos and Googles of the German landscape? Thanks very much for any information to help with this! - Dmitry -- _______________________________________________ Lucene-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lucene-users