the problem with whitespaceanalyzer is that if you have for example a sentence in the
text say "lucene is indexing." a query for "indexing" will produce no hits because "."
is not a token delimiter. you will have to search for "indexing*".
for me the solution was to write my own tokenizer/analyzer pair
--snip
and
--snip
public final class myTokenizer extends CharTokenizer {
/** Construct a new LowerCaseTokenizer. */
public myTokenizer(Reader in) {
super(in);
}
/** Collects only characters which satisfy
* {@link Character#isLetter(char)}.*/
protected char normalize(char c) {
return Character.toLowerCase(c);
}
/** Collects only characters which do not satisfy
* {@link Character#isWhitespace(char)}.*/
protected boolean isTokenChar(char c) {
return Character.isLetterOrDigit(c);
}
}
public final class myAnalyzer extends Analyzer {
public final TokenStream tokenStream(String fieldName, Reader reader) {
return new myTokenizer(reader);
}
}
--snip
regards joe
"RAYMOND Romain" <[EMAIL PROTECTED]> writes on
Tue, 26 Mar 2002 08:53:51 +0100 (MET):
> hello,
>
> The solution we adopted is to use WhiteSpaceAnalyser.
> If you print the result of a query after parsing it (with parse
> method)
> the tokenizers used delete the numbers from the query.
> But WhiteSpaceAnalyser only tokenizes based on ... spaces, so we can
> search on numbers values ....
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>