--- Erik Hatcher <[EMAIL PROTECTED]>
escreveu: > On Monday, September 29, 2003, at 09:19
AM, Hackl,
> Rene wrote:
> > I'm looking for a way to implement simultaneous
> left and right
> > truncation.
> >
> > The goal is to enable the user to search for e.g.
> "*hydronaphth*" and
> > find
> > "hexahydronaphthalene" as well as
> "heptahydronaphthalin".
>
> WildcardQuery, if used directly but not through
> QueryParser, allows for
> left and right wildcards like this. Performance is
> the biggest concern
> though, as it will have to enumerate all terms in
> the index to look for
> matches.
>
> > To achieve that functionality, I'd like to index
> terms in the way that
> > from
> > a token "foobar" the tokens "oobar" and "obar" (
> e.g. mininum word
> > length =
> > 4)
> > would be derived and added to the index. I tried
> to extend
> > TokenFilter, but
> > all I get is either "oobar" or "obar", depends on
> when 'return' is
> > called.
>
> This is an interesting approach and would certainly
> give you the search
> results you are looking for, except that you'll be
> indexing a ton of
> terms I'd guess. If there is some other way to
> split these words by
> separating by prefix ("hexa", "hepta") and suffix
> ("alene", "alin") it
> would likely be better. But maybe its not practical
> to do so.
>
> But, to the main question....
>
> > How could I add such extra tokens to the
> tokenStream? Any thoughts on
> > this
> > appreciated.
>
> Yes, this can be done. Craig Walls did this with an
> example Analyzer
> in his December '02 Java Developer's Journal
> article. I adapted his
> example to use to demonstrate this for
> presentations. Here is my code:
>
> import java.io.IOException;
> import java.util.Enumeration;
> import java.util.HashMap;
> import java.util.ResourceBundle;
> import java.util.Stack;
> import java.util.StringTokenizer;
>
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenFilter;
> import org.apache.lucene.analysis.TokenStream;
>
> public class AliasFilter extends TokenFilter {
> private static HashMap aliasMap = new HashMap();
>
> static {
> aliasMap.put("country","yeehaw");
> aliasMap.put("western","rawhide");
> }
>
> private Stack currentTokenAliases;
>
> public AliasFilter(TokenStream in) {
> currentTokenAliases = new Stack();
> input = in;
> }
>
> public Token next() throws IOException {
> if(currentTokenAliases.size() > 0) {
> return (Token)currentTokenAliases.pop();
> }
>
> Token nextToken = input.next();
> addAliasesToStack(nextToken,
> currentTokenAliases);
>
> return nextToken;
> }
>
> private void addAliasesToStack(Token token, Stack
> aliasStack) {
> if(token == null) return;
>
> String aliasString =
> (String)aliasMap.get(token.termText());
>
> if(aliasString == null || aliasString.length()
> < 1) return;
>
> StringTokenizer tokenizer = new
> StringTokenizer(aliasString, " ");
> while(tokenizer.hasMoreElements()) {
> String nextAlias = tokenizer.nextToken();
> Token nextTokenAlias = new Token(nextAlias,
> 0,
> nextAlias.length());
> aliasStack.push(nextTokenAlias);
> }
> }
> }
>
> Note that this is a braindead example of injecting
> tokens into the
> stream, so when "country" or "western" is
> encountered, another token is
> added to the stream. In your case you'll do
> something with the initial
> token and push all its 4+ length pieces onto the
> stack.
>
> Erik
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]