[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

David Herrera (JIRA) Tue, 20 Sep 2011 00:18:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108413#comment-13108413
 ]


David Herrera commented on LUCENE-949:
--------------------------------------

Hi.

Is there some way to re-open and fix this behavior/bug in AnalyzingQueryParser?
I have discover this opened (and closed 4 years later) bug. We are working with 
Lucene 3.2 and we use AnalyzingQueryParser because we need to parse with 
analyzer every query, even wildcard queries. 

This works great with most queries, and with the ones that don't work (for 
example in cases analyzer add/remove words and query have wildcards) we use 
QueryParser although it doesn't analyze wildcard queries.

In our application there are some cases when we need to allow leading wildcard 
queries, and AnalyzingQueryParser fails although I set to true 
'AllowLeadingWildcard' flag. Strings like '*ucene' is converted into 
WildcardQuery like this 'ucene*'. This is another strange behavior, the ending 
wildcard.

I know QueryParser doesn't have this leading wildcard bug, but I need to parse 
query (I am Spanish and we have special characters (ñ, ü, vocals with accent on 
them) and we parse indexed data, and to search we need to parse query too.



Thanks in advance. Regards!



> AnalyzingQueryParser can't work with leading wildcards.
> -------------------------------------------------------
>
>                 Key: LUCENE-949
>                 URL: https://issues.apache.org/jira/browse/LUCENE-949
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 2.2
>            Reporter: Stefan Klein
>
> The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
> changes to accept leading wildcards:
>       protected Query getWildcardQuery(String field, String termStr) throws 
> ParseException
>       {
>               String useTermStr = termStr;
>               String leadingWildcard = null;
>               if ("*".equals(field))
>               {
>                       if ("*".equals(useTermStr))
>                               return new MatchAllDocsQuery();
>               }
>               boolean hasLeadingWildcard = (useTermStr.startsWith("*") || 
> useTermStr.startsWith("?")) ? true : false;
>               if (!getAllowLeadingWildcard() && hasLeadingWildcard)
>                       throw new ParseException("'*' or '?' not allowed as 
> first character in WildcardQuery");
>               if (getLowercaseExpandedTerms())
>               {
>                       useTermStr = useTermStr.toLowerCase();
>               }
>               if (hasLeadingWildcard)
>               {
>                       leadingWildcard = useTermStr.substring(0, 1);
>                       useTermStr = useTermStr.substring(1);
>               }
>               List tlist = new ArrayList();
>               List wlist = new ArrayList();
>               /*
>                * somewhat a hack: find/store wildcard chars in order to put 
> them back
>                * after analyzing
>                */
>               boolean isWithinToken = (!useTermStr.startsWith("?") && 
> !useTermStr.startsWith("*"));
>               isWithinToken = true;
>               StringBuffer tmpBuffer = new StringBuffer();
>               char[] chars = useTermStr.toCharArray();
>               for (int i = 0; i < useTermStr.length(); i++)
>               {
>                       if (chars[i] == '?' || chars[i] == '*')
>                       {
>                               if (isWithinToken)
>                               {
>                                       tlist.add(tmpBuffer.toString());
>                                       tmpBuffer.setLength(0);
>                               }
>                               isWithinToken = false;
>                       }
>                       else
>                       {
>                               if (!isWithinToken)
>                               {
>                                       wlist.add(tmpBuffer.toString());
>                                       tmpBuffer.setLength(0);
>                               }
>                               isWithinToken = true;
>                       }
>                       tmpBuffer.append(chars[i]);
>               }
>               if (isWithinToken)
>               {
>                       tlist.add(tmpBuffer.toString());
>               }
>               else
>               {
>                       wlist.add(tmpBuffer.toString());
>               }
>               // get Analyzer from superclass and tokenize the term
>               TokenStream source = getAnalyzer().tokenStream(field, new 
> StringReader(useTermStr));
>               org.apache.lucene.analysis.Token t;
>               int countTokens = 0;
>               while (true)
>               {
>                       try
>                       {
>                               t = source.next();
>                       }
>                       catch (IOException e)
>                       {
>                               t = null;
>                       }
>                       if (t == null)
>                       {
>                               break;
>                       }
>                       if (!"".equals(t.termText()))
>                       {
>                               try
>                               {
>                                       tlist.set(countTokens++, t.termText());
>                               }
>                               catch (IndexOutOfBoundsException ioobe)
>                               {
>                                       countTokens = -1;
>                               }
>                       }
>               }
>               try
>               {
>                       source.close();
>               }
>               catch (IOException e)
>               {
>                       // ignore
>               }
>               if (countTokens != tlist.size())
>               {
>                       /*
>                        * this means that the analyzer used either added or 
> consumed
>                        * (common for a stemmer) tokens, and we can't build a 
> WildcardQuery
>                        */
>                       throw new ParseException("Cannot build WildcardQuery 
> with analyzer " + getAnalyzer().getClass()
>                                       + " - tokens added or lost");
>               }
>               if (tlist.size() == 0)
>               {
>                       return null;
>               }
>               else if (tlist.size() == 1)
>               {
>                       if (wlist.size() == 1)
>                       {
>                               /*
>                                * if wlist contains one wildcard, it must be 
> at the end,
>                                * because: 1) wildcards at 1st position of a 
> term by
>                                * QueryParser where truncated 2) if wildcard 
> was *not* in end,
>                                * there would be *two* or more tokens
>                                */
>                               StringBuffer sb = new StringBuffer();
>                               if (hasLeadingWildcard)
>                               {
>                                       // adding leadingWildcard
>                                       sb.append(leadingWildcard);
>                               }
>                               sb.append((String) tlist.get(0));
>                               sb.append(wlist.get(0).toString());
>                               return super.getWildcardQuery(field, 
> sb.toString());
>                       }
>                       else if (wlist.size() == 0 && hasLeadingWildcard)
>                       {
>                               /*
>                                * if wlist contains no wildcard, it must be at 
> 1st position
>                                */
>                               StringBuffer sb = new StringBuffer();
>                               if (hasLeadingWildcard)
>                               {
>                                       // adding leadingWildcard
>                                       sb.append(leadingWildcard);
>                               }
>                               sb.append((String) tlist.get(0));
>                               sb.append(wlist.get(0).toString());
>                               return super.getWildcardQuery(field, 
> sb.toString());
>                       }
>                       else
>                       {
>                               /*
>                                * we should never get here! if so, this method 
> was called with
>                                * a termStr containing no wildcard ...
>                                */
>                               throw new 
> IllegalArgumentException("getWildcardQuery called without wildcard");
>                       }
>               }
>               else
>               {
>                       /*
>                        * the term was tokenized, let's rebuild to one token 
> with wildcards
>                        * put back in postion
>                        */
>                       StringBuffer sb = new StringBuffer();
>                       if (hasLeadingWildcard)
>                       {
>                               // adding leadingWildcard
>                               sb.append(leadingWildcard);
>                       }
>                       for (int i = 0; i < tlist.size(); i++)
>                       {
>                               sb.append((String) tlist.get(i));
>                               if (wlist != null && wlist.size() > i)
>                               {
>                                       sb.append((String) wlist.get(i));
>                               }
>                       }
>                       return super.getWildcardQuery(field, sb.toString());
>               }
>       }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

Reply via email to