Hi,

Hackl, Rene a �crit:

I tried to extend TokenFilter, but all I get is either "oobar" or "obar", depends on when 'return' is called.

How could I add such extra tokens to the tokenStream? Any thoughts on this
appreciated.

Adapted from my... arabic Analyzer :


public class ChemicalOrSomethingFilter extends TokenFilter {
        
  private Token receivedToken = null;
  private StringBuffer receivedText = new StringBuffer();       
        
  public ChemicalOrSomethingFilter(TokenStream input){
    super(input);
  }
        
  private String getNextTruncation() {          
    StringBuffer emittedText = new StringBuffer();
    //left trim the token
    while(true) {
      if (receivedText.length() == 0) break;
      char c = receivedText.charAt(0);
      if (!Character.isWhitespace(c)) break;
      receivedText.deleteCharAt(0);             
    }                           
    //keep the good stuff
    while(true) {
      if (receivedText.length() == 0) break;
      char c = receivedText.charAt(0);
      if (Character.isWhitespace(c)) break;
      emittedText.append(receivedText.charAt(0));
      receivedText.deleteCharAt(0);                     
    }           
    //right trim the token
    while(true) {
     if (receivedText.length() == 0) break;
     char c = receivedText.charAt(0);
     if (!Character.isWhitespace(c)) break;
     receivedText.deleteCharAt(0);              
    }                           
    return emittedText.toString();
  }

public final Token next() throws IOException {
while (true) {
String emittedText;
int positionIncrement = 0;
//New token ?
if (receivedText.length() == 0) {
receivedToken = input.next();
if (receivedToken == null) return null;
receivedText.append(receivedToken.termText());
positionIncrement = 1;
}
emittedText = getNextPart();
//Warning : all tokens are emitted with the *same* offset
if (emittedText.length() > 0) {
Token emittedToken = new Token(emittedText, receivedToken.startOffset(), receivedToken.endOffset()); emittedToken.setPositionIncrement(positionIncrement);
return emittedToken;
}
}
}
}


Not tested at all : it is a quick copy of my "WhiteSpaceFilter" (that's why triming is so important up there ;-) which is not the best designed class.

This should work for indexing. For querying, it's another matter especially if you want to use the queryParser.

Keep us informed.

Cheers,

--
Pierrick Brihaye, informaticien
Service r�gional de l'Inventaire
DRAC Bretagne
mailto:[EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to