[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

Steve Rowe (JIRA) Fri, 12 Apr 2013 10:16:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630354#comment-13630354
 ]


Steve Rowe commented on LUCENE-949:
-----------------------------------

Hi Timothy, I agree with Robert, the patch reduces line count while improving 
clarity and adding functionality - sweet!

A few nitpicks (+1 otherwise):

- You don't handle escaped metachars ? and * in the query, though the original 
doesn't either, so this shouldn't stop the patch from being committed.
- The {{(?s)}} has no effect in {{"(?s)([^\\?\\*]+)"}}, since it only affects 
the expansion of the dot metachar, which isn't present (see 
[http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#DOTALL]):
 an inverted charset will always include newlines unless the charset to be 
inverted explicitly includes them.  Of course, this is all moot, since the 
query parser will already have split on whitespace before sending {{termStr}} 
to {{getWildcardQuery()}} :) 
- You can drop the following line in {{normalizeMustBeSingleTerm()}}: 
{{nonWildcardMatcher.reset(termStr);}}, since nonWildcardMatcher is never 
reused.
- The presence of term breaking chars is not the only condition under which 
analysis of a single term can produce multiple terms (e.g. synonyms, multiple 
readings), so the following exception message should be generalized: "There is 
a term breaking character between %s and %s".
- In your new test, the following assertion message seems to have too many 
words? "Testing wildcard with wildcard with initial wildcard not allowed"
                
> AnalyzingQueryParser can't work with leading wildcards.
> -------------------------------------------------------
>
>                 Key: LUCENE-949
>                 URL: https://issues.apache.org/jira/browse/LUCENE-949
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 2.2
>            Reporter: Stefan Klein
>         Attachments: AnalyzingQueryParser.java, LUCENE-949.patch
>
>
> The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
> changes to accept leading wildcards:
>       protected Query getWildcardQuery(String field, String termStr) throws 
> ParseException
>       {
>               String useTermStr = termStr;
>               String leadingWildcard = null;
>               if ("*".equals(field))
>               {
>                       if ("*".equals(useTermStr))
>                               return new MatchAllDocsQuery();
>               }
>               boolean hasLeadingWildcard = (useTermStr.startsWith("*") || 
> useTermStr.startsWith("?")) ? true : false;
>               if (!getAllowLeadingWildcard() && hasLeadingWildcard)
>                       throw new ParseException("'*' or '?' not allowed as 
> first character in WildcardQuery");
>               if (getLowercaseExpandedTerms())
>               {
>                       useTermStr = useTermStr.toLowerCase();
>               }
>               if (hasLeadingWildcard)
>               {
>                       leadingWildcard = useTermStr.substring(0, 1);
>                       useTermStr = useTermStr.substring(1);
>               }
>               List tlist = new ArrayList();
>               List wlist = new ArrayList();
>               /*
>                * somewhat a hack: find/store wildcard chars in order to put 
> them back
>                * after analyzing
>                */
>               boolean isWithinToken = (!useTermStr.startsWith("?") && 
> !useTermStr.startsWith("*"));
>               isWithinToken = true;
>               StringBuffer tmpBuffer = new StringBuffer();
>               char[] chars = useTermStr.toCharArray();
>               for (int i = 0; i < useTermStr.length(); i++)
>               {
>                       if (chars[i] == '?' || chars[i] == '*')
>                       {
>                               if (isWithinToken)
>                               {
>                                       tlist.add(tmpBuffer.toString());
>                                       tmpBuffer.setLength(0);
>                               }
>                               isWithinToken = false;
>                       }
>                       else
>                       {
>                               if (!isWithinToken)
>                               {
>                                       wlist.add(tmpBuffer.toString());
>                                       tmpBuffer.setLength(0);
>                               }
>                               isWithinToken = true;
>                       }
>                       tmpBuffer.append(chars[i]);
>               }
>               if (isWithinToken)
>               {
>                       tlist.add(tmpBuffer.toString());
>               }
>               else
>               {
>                       wlist.add(tmpBuffer.toString());
>               }
>               // get Analyzer from superclass and tokenize the term
>               TokenStream source = getAnalyzer().tokenStream(field, new 
> StringReader(useTermStr));
>               org.apache.lucene.analysis.Token t;
>               int countTokens = 0;
>               while (true)
>               {
>                       try
>                       {
>                               t = source.next();
>                       }
>                       catch (IOException e)
>                       {
>                               t = null;
>                       }
>                       if (t == null)
>                       {
>                               break;
>                       }
>                       if (!"".equals(t.termText()))
>                       {
>                               try
>                               {
>                                       tlist.set(countTokens++, t.termText());
>                               }
>                               catch (IndexOutOfBoundsException ioobe)
>                               {
>                                       countTokens = -1;
>                               }
>                       }
>               }
>               try
>               {
>                       source.close();
>               }
>               catch (IOException e)
>               {
>                       // ignore
>               }
>               if (countTokens != tlist.size())
>               {
>                       /*
>                        * this means that the analyzer used either added or 
> consumed
>                        * (common for a stemmer) tokens, and we can't build a 
> WildcardQuery
>                        */
>                       throw new ParseException("Cannot build WildcardQuery 
> with analyzer " + getAnalyzer().getClass()
>                                       + " - tokens added or lost");
>               }
>               if (tlist.size() == 0)
>               {
>                       return null;
>               }
>               else if (tlist.size() == 1)
>               {
>                       if (wlist.size() == 1)
>                       {
>                               /*
>                                * if wlist contains one wildcard, it must be 
> at the end,
>                                * because: 1) wildcards at 1st position of a 
> term by
>                                * QueryParser where truncated 2) if wildcard 
> was *not* in end,
>                                * there would be *two* or more tokens
>                                */
>                               StringBuffer sb = new StringBuffer();
>                               if (hasLeadingWildcard)
>                               {
>                                       // adding leadingWildcard
>                                       sb.append(leadingWildcard);
>                               }
>                               sb.append((String) tlist.get(0));
>                               sb.append(wlist.get(0).toString());
>                               return super.getWildcardQuery(field, 
> sb.toString());
>                       }
>                       else if (wlist.size() == 0 && hasLeadingWildcard)
>                       {
>                               /*
>                                * if wlist contains no wildcard, it must be at 
> 1st position
>                                */
>                               StringBuffer sb = new StringBuffer();
>                               if (hasLeadingWildcard)
>                               {
>                                       // adding leadingWildcard
>                                       sb.append(leadingWildcard);
>                               }
>                               sb.append((String) tlist.get(0));
>                               sb.append(wlist.get(0).toString());
>                               return super.getWildcardQuery(field, 
> sb.toString());
>                       }
>                       else
>                       {
>                               /*
>                                * we should never get here! if so, this method 
> was called with
>                                * a termStr containing no wildcard ...
>                                */
>                               throw new 
> IllegalArgumentException("getWildcardQuery called without wildcard");
>                       }
>               }
>               else
>               {
>                       /*
>                        * the term was tokenized, let's rebuild to one token 
> with wildcards
>                        * put back in postion
>                        */
>                       StringBuffer sb = new StringBuffer();
>                       if (hasLeadingWildcard)
>                       {
>                               // adding leadingWildcard
>                               sb.append(leadingWildcard);
>                       }
>                       for (int i = 0; i < tlist.size(); i++)
>                       {
>                               sb.append((String) tlist.get(i));
>                               if (wlist != null && wlist.size() > i)
>                               {
>                                       sb.append((String) wlist.get(i));
>                               }
>                       }
>                       return super.getWildcardQuery(field, sb.toString());
>               }
>       }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

Reply via email to