[jira] [Updated] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

Steve Rowe (JIRA) Mon, 06 May 2013 00:12:19 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Rowe updated LUCENE-949:
------------------------------

    Attachment: LUCENE-949.patch

Hi [~talli...@mitre.org],

Sorry it took so long, I've attached a patch based on your patch with some 
fixes:

* Removed tabs.
* Restored license header and class javadoc to {{AnalyzingQueryParser.java}} 
(your patch removed them for some reason?).
* Converted all code indentation to 2 spaces per level (you had a lot of 3 
space per level indentation).
* Converted the {{wildcardPattern}} to allow anything to be escaped, not just 
backslashes and wildcard chars '?' and '*'.  Also removed the optional 
backslashes from group 2 (the actual wildcards) - when iterating over 
wildcardPattern matches, your patch would throw away any number of real 
wildcards following an escaped wildcard.  I added a test for this.
* When multiple output tokens are produced (and there should only be one), now 
reporting all of them in the exception message instead of just the first two.
* Removed all references to "chunklet" in favor of "output token" - this 
non-standard terminology made the code harder to read.
* Changed descriptions of multiple output tokens to not necessarily be as the 
result of splitting (e.g. synonyms).
* In {{analyzeSingleChunk()}}, moved exception throwing to the source of 
problems.

I also added a {{CHANGES.txt}} entry.  

Tim, let me know if you think my changes are okay - if so, I think it's ready 
to commit.
                
> AnalyzingQueryParser can't work with leading wildcards.
> -------------------------------------------------------
>
>                 Key: LUCENE-949
>                 URL: https://issues.apache.org/jira/browse/LUCENE-949
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 2.2
>            Reporter: Stefan Klein
>         Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch
>
>
> The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
> changes to accept leading wildcards:
>       protected Query getWildcardQuery(String field, String termStr) throws 
> ParseException
>       {
>               String useTermStr = termStr;
>               String leadingWildcard = null;
>               if ("*".equals(field))
>               {
>                       if ("*".equals(useTermStr))
>                               return new MatchAllDocsQuery();
>               }
>               boolean hasLeadingWildcard = (useTermStr.startsWith("*") || 
> useTermStr.startsWith("?")) ? true : false;
>               if (!getAllowLeadingWildcard() && hasLeadingWildcard)
>                       throw new ParseException("'*' or '?' not allowed as 
> first character in WildcardQuery");
>               if (getLowercaseExpandedTerms())
>               {
>                       useTermStr = useTermStr.toLowerCase();
>               }
>               if (hasLeadingWildcard)
>               {
>                       leadingWildcard = useTermStr.substring(0, 1);
>                       useTermStr = useTermStr.substring(1);
>               }
>               List tlist = new ArrayList();
>               List wlist = new ArrayList();
>               /*
>                * somewhat a hack: find/store wildcard chars in order to put 
> them back
>                * after analyzing
>                */
>               boolean isWithinToken = (!useTermStr.startsWith("?") && 
> !useTermStr.startsWith("*"));
>               isWithinToken = true;
>               StringBuffer tmpBuffer = new StringBuffer();
>               char[] chars = useTermStr.toCharArray();
>               for (int i = 0; i < useTermStr.length(); i++)
>               {
>                       if (chars[i] == '?' || chars[i] == '*')
>                       {
>                               if (isWithinToken)
>                               {
>                                       tlist.add(tmpBuffer.toString());
>                                       tmpBuffer.setLength(0);
>                               }
>                               isWithinToken = false;
>                       }
>                       else
>                       {
>                               if (!isWithinToken)
>                               {
>                                       wlist.add(tmpBuffer.toString());
>                                       tmpBuffer.setLength(0);
>                               }
>                               isWithinToken = true;
>                       }
>                       tmpBuffer.append(chars[i]);
>               }
>               if (isWithinToken)
>               {
>                       tlist.add(tmpBuffer.toString());
>               }
>               else
>               {
>                       wlist.add(tmpBuffer.toString());
>               }
>               // get Analyzer from superclass and tokenize the term
>               TokenStream source = getAnalyzer().tokenStream(field, new 
> StringReader(useTermStr));
>               org.apache.lucene.analysis.Token t;
>               int countTokens = 0;
>               while (true)
>               {
>                       try
>                       {
>                               t = source.next();
>                       }
>                       catch (IOException e)
>                       {
>                               t = null;
>                       }
>                       if (t == null)
>                       {
>                               break;
>                       }
>                       if (!"".equals(t.termText()))
>                       {
>                               try
>                               {
>                                       tlist.set(countTokens++, t.termText());
>                               }
>                               catch (IndexOutOfBoundsException ioobe)
>                               {
>                                       countTokens = -1;
>                               }
>                       }
>               }
>               try
>               {
>                       source.close();
>               }
>               catch (IOException e)
>               {
>                       // ignore
>               }
>               if (countTokens != tlist.size())
>               {
>                       /*
>                        * this means that the analyzer used either added or 
> consumed
>                        * (common for a stemmer) tokens, and we can't build a 
> WildcardQuery
>                        */
>                       throw new ParseException("Cannot build WildcardQuery 
> with analyzer " + getAnalyzer().getClass()
>                                       + " - tokens added or lost");
>               }
>               if (tlist.size() == 0)
>               {
>                       return null;
>               }
>               else if (tlist.size() == 1)
>               {
>                       if (wlist.size() == 1)
>                       {
>                               /*
>                                * if wlist contains one wildcard, it must be 
> at the end,
>                                * because: 1) wildcards at 1st position of a 
> term by
>                                * QueryParser where truncated 2) if wildcard 
> was *not* in end,
>                                * there would be *two* or more tokens
>                                */
>                               StringBuffer sb = new StringBuffer();
>                               if (hasLeadingWildcard)
>                               {
>                                       // adding leadingWildcard
>                                       sb.append(leadingWildcard);
>                               }
>                               sb.append((String) tlist.get(0));
>                               sb.append(wlist.get(0).toString());
>                               return super.getWildcardQuery(field, 
> sb.toString());
>                       }
>                       else if (wlist.size() == 0 && hasLeadingWildcard)
>                       {
>                               /*
>                                * if wlist contains no wildcard, it must be at 
> 1st position
>                                */
>                               StringBuffer sb = new StringBuffer();
>                               if (hasLeadingWildcard)
>                               {
>                                       // adding leadingWildcard
>                                       sb.append(leadingWildcard);
>                               }
>                               sb.append((String) tlist.get(0));
>                               sb.append(wlist.get(0).toString());
>                               return super.getWildcardQuery(field, 
> sb.toString());
>                       }
>                       else
>                       {
>                               /*
>                                * we should never get here! if so, this method 
> was called with
>                                * a termStr containing no wildcard ...
>                                */
>                               throw new 
> IllegalArgumentException("getWildcardQuery called without wildcard");
>                       }
>               }
>               else
>               {
>                       /*
>                        * the term was tokenized, let's rebuild to one token 
> with wildcards
>                        * put back in postion
>                        */
>                       StringBuffer sb = new StringBuffer();
>                       if (hasLeadingWildcard)
>                       {
>                               // adding leadingWildcard
>                               sb.append(leadingWildcard);
>                       }
>                       for (int i = 0; i < tlist.size(); i++)
>                       {
>                               sb.append((String) tlist.get(i));
>                               if (wlist != null && wlist.size() > i)
>                               {
>                                       sb.append((String) wlist.get(i));
>                               }
>                       }
>                       return super.getWildcardQuery(field, sb.toString());
>               }
>       }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

Reply via email to