Hi All,

We have a "un_tokenized" field in Lucene, which contains string values like 
"DMSKM_1234", "rpsla_5678" etc. We observed that the searches on this field are 
not working as expected. If we search for "DMSKM_1234" using standard analyzer, 
then the required document is never returned. However if we search for 
"rpsla_5678" then the results are as expected. I believe that the problem is 
because "un_tokenized" fields are not passed through analyzer and indexed 
without any changes (For example, "DMSKM_1234" would be indexed without any 
case changes). However when we search using "DMSKM_1234", the query parser 
object converts that to lowercase "dmskm_1234". Since the "un-tokenized" fields 
are subjected to exact match, the expected result is never returned. The system 
works well for "rpsla_5678" values because all letters are already in 
lower-case.

The possible solution that I came to know is to make use of 
"SetLowercaseExpandedTerms" property on QueryParser object and set the value to 
FALSE. This would not convert the keywords into lower case. However this will 
make the searches case-sensitive.

Questions:


1.       Is there a better to way to handle above un-tokenized field to enable 
case-insensitive searches?

2.       What would be impact of setting "SetLowercaseExpandedTerms" to TRUE on 
tokenized fields? For example if the query is "title:agreement AND 
dmsid:DMSKM_1234" where "Title" is a tokenized field and "dmsid" is a 
un-tokenized field. Will Title field would also become case-sensitive?

Limitation:

We are trying to avoid index re-building effort due to huge size and would like 
resolve above problem in context of index searching.


Thanks & regards,

________________________________

Nitin Shiralkar | Engineering Lead

Phone : +91 (20) 40119 113

CoreObjects

Fax :    +91 (80) 40119 111
Cell :    +91 988137 0303

We build the software that builds companiesTM

Website: www.coreobjects.com<http://www.coreobjects.com/>





________________________________
This email and any files transmitted with it are confidential and privileged 
information, intended solely for the use of the individual or entity to whom 
they are addressed. Any unauthorized review, use, disclosure or distribution is 
prohibited. If you are not the intended recipient, please notify the system 
manager, contact the sender by reply email and destroy all copies of the 
original message. Please note that any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. The recipient should check this email and any attachments for the 
presence of viruses. The company accepts no liability for any damage caused by 
any virus transmitted by this email.


Reply via email to