Re: Performance implications of unanlyzed content

Incze Lajos Fri, 16 Apr 2004 12:42:52 -0700

On Fri, Apr 16, 2004 at 08:59:42AM +0200, Magnus Johansson wrote:
> Hi
> 
> I'm developing an application using Lucene where I need to
> be able to both search using a stemmer and sometimes using
> "exact" search.
> 
> I see two ways of doing this:
> 
> 1. Use two indexes. One using a stemming analyzer and one using
>    a SimpleAnalyzer
> 
> 2. Using duplicate fields. One field with stemmed content and
>    one with unstemmed content. (Perhaps the field CONTENT, will be
>    CONTENT and CONTENT_RAW)
> 
> I'm leaning towards option 2. However I'm interested in any performance
> implications. If I understand it correctly Lucene keeps separate
> term-dictionaries for each field. So besides the index growing larger
> (which might affect caching) it won't be any slower searching the index
> with duplicate fields when I only query on the CONTENT field
> 
> Is this correct?
> 
> 
> Magnus


In the exact same situation I'm using your option 2. There may be some
perfomance implication, but it's well under human recognition in my case.

incze

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance implications of unanlyzed content

Reply via email to