Marcus Falck wrote: > Any good approaches for allowing case sensitive and case insensitive > searches? > > Except adding an additional field and skipping the LowerCaseFilter. > Since this severely increases the index size (and the index already > is around 1 TB).
Hi Marcus, How about a filter that emits two token for non-fully-lowercase tokens: first the original, and then the downcased version, and places both at the same position. This should minimize index growth. Something like this (WARNING: Not Tested!!): -----------begin DualCaseFilter.java------------- package org.apache.lucene.analysis; import java.io.IOException; public final class DualCaseFilter extends TokenFilter { String downcasedPreviousToken = null; public DualCaseFilter(TokenStream input) { super(input); } public final Token next() throws IOException { if (downcasedPreviousToken != null) { Token t = downcasedPreviousToken; downcasedPreviousToken = null; return t; } Token t = input.next(); if (t != null) { String downcased = t.termText.toLowerCase(); if ( ! t.termText.equals(downcased)) { downcasedPreviousToken = t.clone(); downcasedPreviousToken.termText = downcased; downcasedPreviousToken.setPositionIncrement(0); } } return t; } } -----------end DualCaseFilter.java------------- Hope it helps, Steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]