AW: AW: AW: N-gram layer and language guessing

Karsten Konrad Tue, 03 Feb 2004 06:36:58 -0800

>>

I didn't get your point here. Are you pro or anti-ngrams?
>>

I am very pro, but we are currently not using ngrams for
indexing larger texts, as we expect problems from the
order of magnitude more tokens to index - but we did 
not test this, due to time constraints in our 
development.

So, if you do ngrams for searching, and some measurements
how the greater number of ngrams have effect on search
speed, that would be nice.

>>
If I stem the query and then stem information in the index 
in realtime, stemming won't take up any extra space? Or?
>>

Stemming vs ngrams is a topic of its own. Stemmers
usually are fast and do not need space on disc. But 
then, for some languages it is hard to write good 
stemmers, and stemmers can't handle spelling errors. 
Ngrams work for all languages, can handle spelling
variations, umlaute, errors, mixed language documents
etc. 

Have fun with ngram,

Karsten

-----Ursprüngliche Nachricht-----
Von: karl wettin [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 3. Februar 2004 14:01
An: Lucene Developers List
Betreff: Re: AW: AW: N-gram layer and language guessing

On Tue, 3 Feb 2004 13:36:35 +0100
"Karsten Konrad" <[EMAIL PROTECTED]> wrote:

> 
> If you use ngrams consistently, you can leave out stemming and spend 
> your time with something different (like buing a bigger harddisc for 
> your indexes, you probably will need them then :)

I didn't get your point here. Are you pro or anti-ngrams?

If I stem the query and then stem information in the index in realtime, stemming won't 
take up any extra space? Or?

I'm quite green when it comes to indexes. It's all trie-patterns to me.

kalle

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

AW: AW: AW: N-gram layer and language guessing

Reply via email to