Hackl, Rene a �crit:
I tried to extend TokenFilter, but all I get is either "oobar" or "obar", depends on when 'return' is called.
How could I add such extra tokens to the tokenStream? Any thoughts on this appreciated.
Adapted from my... arabic Analyzer :
public class ChemicalOrSomethingFilter extends TokenFilter {
private Token receivedToken = null;
private StringBuffer receivedText = new StringBuffer();
public ChemicalOrSomethingFilter(TokenStream input){
super(input);
}
private String getNextTruncation() {
StringBuffer emittedText = new StringBuffer();
//left trim the token
while(true) {
if (receivedText.length() == 0) break;
char c = receivedText.charAt(0);
if (!Character.isWhitespace(c)) break;
receivedText.deleteCharAt(0);
}
//keep the good stuff
while(true) {
if (receivedText.length() == 0) break;
char c = receivedText.charAt(0);
if (Character.isWhitespace(c)) break;
emittedText.append(receivedText.charAt(0));
receivedText.deleteCharAt(0);
}
//right trim the token
while(true) {
if (receivedText.length() == 0) break;
char c = receivedText.charAt(0);
if (!Character.isWhitespace(c)) break;
receivedText.deleteCharAt(0);
}
return emittedText.toString();
}public final Token next() throws IOException {
while (true) {
String emittedText;
int positionIncrement = 0;
//New token ?
if (receivedText.length() == 0) {
receivedToken = input.next();
if (receivedToken == null) return null;
receivedText.append(receivedToken.termText());
positionIncrement = 1;
}
emittedText = getNextPart();
//Warning : all tokens are emitted with the *same* offset
if (emittedText.length() > 0) {
Token emittedToken = new Token(emittedText, receivedToken.startOffset(), receivedToken.endOffset()); emittedToken.setPositionIncrement(positionIncrement);
return emittedToken;
}
}
}
}
Not tested at all : it is a quick copy of my "WhiteSpaceFilter" (that's why triming is so important up there ;-) which is not the best designed class.
This should work for indexing. For querying, it's another matter especially if you want to use the queryParser.
Keep us informed.
Cheers,
-- Pierrick Brihaye, informaticien Service r�gional de l'Inventaire DRAC Bretagne mailto:[EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
