Re: [Ferret-talk] How to do case-sensitive searches

Jens Kraemer Sun, 23 Apr 2006 03:42:09 -0700

Hi Carl,

On Tue, Apr 18, 2006 at 11:32:36PM -0600, Carl Youngblood wrote:
> Forgive me if this topic has already been discussed on the list.  I
> googled but couldn't find much.  I'd like to search through text for
> US state abbreviations that are written in capitals.  What is the best
> way to do this?  I read somewhere that tokenized fields are stored in
> the index in lowercase, so I am concerned that I will lose precision. 
> What is the best way to store a field so that normal searches are
> case-insensitive but case-sensitive searches can still be made?


Are you sure this is a problem, i.e. do you get wrong hits because 
the lowercase variant of an abbreviation is used in another context ?
I don't know what those abbrevs look like...

To run case-sensitive and case-insensitive searches you'd need two
fields, a tokenized one for normal case-insensitive searches, and an
untokenized one for looking up the abbreviations. 

To reduce overhead in the index, you could filter the text for the 
known set of abbreviations at indexing time and only store those 
values in the untokenized field. Possibly this could be done in a 
custom analyzer.

regards,
Jens


-- 
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer       [EMAIL PROTECTED]
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] How to do case-sensitive searches

Reply via email to