>> I didn't get your point here. Are you pro or anti-ngrams? >>
I am very pro, but we are currently not using ngrams for indexing larger texts, as we expect problems from the order of magnitude more tokens to index - but we did not test this, due to time constraints in our development. So, if you do ngrams for searching, and some measurements how the greater number of ngrams have effect on search speed, that would be nice. >> If I stem the query and then stem information in the index in realtime, stemming won't take up any extra space? Or? >> Stemming vs ngrams is a topic of its own. Stemmers usually are fast and do not need space on disc. But then, for some languages it is hard to write good stemmers, and stemmers can't handle spelling errors. Ngrams work for all languages, can handle spelling variations, umlaute, errors, mixed language documents etc. Have fun with ngram, Karsten -----Ursprüngliche Nachricht----- Von: karl wettin [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 3. Februar 2004 14:01 An: Lucene Developers List Betreff: Re: AW: AW: N-gram layer and language guessing On Tue, 3 Feb 2004 13:36:35 +0100 "Karsten Konrad" <[EMAIL PROTECTED]> wrote: > > If you use ngrams consistently, you can leave out stemming and spend > your time with something different (like buing a bigger harddisc for > your indexes, you probably will need them then :) I didn't get your point here. Are you pro or anti-ngrams? If I stem the query and then stem information in the index in realtime, stemming won't take up any extra space? Or? I'm quite green when it comes to indexes. It's all trie-patterns to me. kalle --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]