Thank you, this was exactly what I needed. So "tokenizing" really
denotes a more general process that can involve normalizing the case or
whatever else can be done with a filter. This is where I was confused.
Eleanor
Jan Peter Stotz wrote:
Hi Eleanor.
In my Lucene index there's a field that contains the local names of
XML elements, one name per document. Users can enter arbitrary
queries for this field, so I'm using a QueryParser.
From reading around it looks as if the field needs to be tokenized,
but since the field's content is always a single term, is this really
necessary?
You are right, your field is already tokenized, but from what I know the
main difference is that untokenized fields do not pass your selected
analyzer when being added to the index. If your analyzer for example
incorporates the LowerCaseFilter, the field will be converted into
lower case before it is indexed. When using the same analyzer for your
QueryParser this will allow you to perform case insensitive query.
If you add the field untokenized and your Analyzer (at query time)
incorporates the LowerCaseFilter, you will be unable find elements that
contain upper characters.
Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Eleanor Joslin, Software Development DecisionSoft Ltd.
Telephone: +44-1865-203192 http://www.decisionsoft.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]