Performance implications of unanlyzed content

Magnus Johansson Fri, 16 Apr 2004 00:00:31 -0700

Hi

I'm developing an application using Lucene where I need to
be able to both search using a stemmer and sometimes using
"exact" search.


I see two ways of doing this:

1. Use two indexes. One using a stemming analyzer and one using
   a SimpleAnalyzer

2. Using duplicate fields. One field with stemmed content and
   one with unstemmed content. (Perhaps the field CONTENT, will be
   CONTENT and CONTENT_RAW)

I'm leaning towards option 2. However I'm interested in any performance
implications. If I understand it correctly Lucene keeps separate
term-dictionaries for each field. So besides the index growing larger
(which might affect caching) it won't be any slower searching the index
with duplicate fields when I only query on the CONTENT field

Is this correct?


Magnus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Performance implications of unanlyzed content

Reply via email to