Hi

I'm developing an application using Lucene where I need to
be able to both search using a stemmer and sometimes using
"exact" search.

I see two ways of doing this:

1. Use two indexes. One using a stemming analyzer and one using
   a SimpleAnalyzer

2. Using duplicate fields. One field with stemmed content and
   one with unstemmed content. (Perhaps the field CONTENT, will be
   CONTENT and CONTENT_RAW)

I'm leaning towards option 2. However I'm interested in any performance
implications. If I understand it correctly Lucene keeps separate
term-dictionaries for each field. So besides the index growing larger
(which might affect caching) it won't be any slower searching the index
with duplicate fields when I only query on the CONTENT field

Is this correct?


Magnus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to