Re: Performance issues with ConjunctionScorer

mark harwood Tue, 22 Nov 2005 08:26:38 -0800

The Highlighter in the lucene "contrib" section has a
class called TokenSources which tries to find the best
way of getting a TokenStream.
It can build a TokenStream from either:
a) an Analyzer
b) TermPositionVector (if the field was created with
one in the index)


You may find that using TermPositionVectors in your
index gives you a speed up but it all depends on the
cost of processing done by your analyzer. Using
TermPositionVector incurs extra data reads to get the
list of tokens from disk whereas using Analyzer is
extra CPU load processing the document text you've
already read from disk.
Both approaches typically need to read the original
document text when highlighting in order to retain the
stop words that make it readable. 
I have noticed before now that the StandardAnalyzer
was quite slow but other Analyzers are much quicker so
it can really depend on your choice.

Cheers
Mark


                
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance issues with ConjunctionScorer

Reply via email to