> I am attempting to use Token.setPositionIncrement() to provide alternate forms of 
> tokens and I have encountered strange
> behavior with QueryParser.  It seems to be constructing phrase queries with the 
> alternate tokens.  I don't know why the
> query would be parsed as a phrase.
> 
> For example, consider an Analyzer that adds lowercase tokens to the token stream as 
> alternate forms (position increment = 0).
> Parsing the query "Bush" (quotes added for emphasis and not part of query) results 
> in a query of text:"Bush bush" ("text" is
> the default field).  Whereas parsing the query "bush" results in the query 
> text:bush.  Notice the lack of quotes in the second
> case, which has no alternate form appended because the token is already lowercase.  
> Is this a bug or is there some 
> explanation of which I am not aware?
> 
> The following two classes provide test code verifying this behaviour.
> 
> 
> 
> /**
>  * A test analyzer employing a TestLowerCaseFilter to demonstrate problems with
>  * QueryParser when dealing with multiple tokens at the same position.
>  */
> public class TestAnalyzer extends Analyzer {
>       /**
>        * Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
> PROTECTED]
>        * StandardFilter} and a [EMAIL PROTECTED] TestLowerCaseFilter}.
>        */
>       public final TokenStream tokenStream(String fieldName, Reader reader) {
>               TokenStream result = new StandardTokenizer(reader);
>               result = new StandardFilter(result);
>               result = new TestLowerCaseFilter(result, Locale.getDefault());
>               return result;
>       }
>       
>       public static void main(String[] args) {
>               TestAnalyzer analyzer = new TestAnalyzer();
>               try {
>                       Query lowerCaseQuery = QueryParser.parse("bush", "text", 
> analyzer);
>                       Query upperCaseQuery = QueryParser.parse("Bush", "text", 
> analyzer);
>                       
>                       System.out.println("lower case: " + lowerCaseQuery.toString());
>                       System.out.println("upper case: " + upperCaseQuery.toString());
>               } catch (ParseException e) {
>                       // TODO Auto-generated catch block
>                       e.printStackTrace();
>               }
>               
>       }
> }
> 
> /**
>  *
>  * A [EMAIL PROTECTED] Filter} that adds alternate forms (lower case) for upper case 
>  * tokens to a [EMAIL PROTECTED] TokenStream}.
>  */
> public class TestLowerCaseFilter extends TokenFilter {
>       private Locale locale;
>       private Token alternateToken;
> 
>       public TestLowerCaseFilter(TokenStream stream, Locale locale) {
>               super(stream);
>               this.locale = locale;
>               this.alternateToken = null;
>       }
> 
>       /* (non-Javadoc)
>        * @see org.apache.lucene.analysis.TokenStream#next()
>        */
>       public Token next() throws IOException {
>       
>               Token rval = null;
>               if (alternateToken != null) {
>                       rval = alternateToken;
>                       alternateToken = null;
>               } else {
>                       Token nextToken = input.next();
>                       if (nextToken == null) {
>                               return null;
>                       }
>                       String text = nextToken.termText();
>                       String lc = text.toLowerCase(locale);
>                       rval = nextToken;
>                       if (!lc.equals(text)) {
> 
>                               alternateToken =
>                                       new Token(
>                                               lc,
>                                               nextToken.startOffset(),
>                                               nextToken.endOffset());
>                               alternateToken.setPositionIncrement(0);
>                       }
>               }
>               return rval;
>       }
> 
> }  

Reply via email to