I have written an Analyzer for swedish. Compound words are common in
swedish, therefore my Analyzer tries to split the compound words
into its parts. For example the swedish word fotbollsmatch (football game) is split into fotboll and match.
However when I use my Analyzer with the QueryParser the query footballsmatch is changed into "fotbolls match" (notice the quotes)
when what I really want is the query fotbolls match (with no qoutes).
Is this possible? The splitting of compound words is
of no real use if I can't get rid of the qoutes.
I have attached some sample code that illustrates the problem (using a dummy Analyzer that splits words larger than five charcters into two)
/magnus
------------------------------------------------------------------
import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.StandardTokenizer; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Query;
import java.io.Reader; import java.io.IOException;
public class TestAnalyzer extends Analyzer {
public TokenStream tokenStream(String s, Reader reader) {
return new SplitStream(new StandardTokenizer(reader));
} public static void main(String[] args) throws Exception {
QueryParser qp = new QueryParser("fieldname",
new TestAnalyzer());
Query q = qp.parse("queryparser");
System.out.println("Query: " + q.toString("fieldname"));
System.out.println("Correct: query parser");
}
}class SplitStream extends TokenStream {
private static final int SPLIT_SIZE = 5;
private TokenStream tstream;
private String buffer = null;
private int start, end;public SplitStream(TokenStream tstream) { this.tstream = tstream; }
public Token next() throws IOException {
if (buffer == null) {
Token tok = tstream.next();
if (tok == null) {
return null;
} else if (tok.termText().length() > SPLIT_SIZE) {
buffer = tok.termText().substring(SPLIT_SIZE);
start = tok.startOffset() + SPLIT_SIZE;
end = tok.endOffset();
return new Token(
tok.termText().substring(0, SPLIT_SIZE),
tok.startOffset(),
tok.startOffset() + SPLIT_SIZE);
} else {
return tok;
}
} else {
Token t = new Token(buffer, start, end);
buffer = null;
return t;
}
}
}
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
