[ 
https://issues.apache.org/jira/browse/LUCENE-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4382:
----------------------------------

    Fix Version/s:     (was: 4.3)
                   4.4
    
> Unicode escape no longer works for non-suffix-only wildcard terms
> -----------------------------------------------------------------
>
>                 Key: LUCENE-4382
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4382
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 4.0-BETA
>            Reporter: Jack Krupansky
>             Fix For: 4.4
>
>
> LUCENE-588 added support for escaping of wildcard characters, but when the 
> de-escaping logic was pushed down from the query parser (QueryParserBase) 
> into WildcardQuery, support for Unicode escaping (backslash, "u", and the 
> four-digit hex Unicode code) was not included.
> Two solutions:
> 1. Do the Unicode de-escaping in the query parser before calling 
> getWildcardQuery.
> 2. Support Unicode de-escaping in WildcardQuery.
> A suffix-only wildcard does not exhibit this problem because full de-escaping 
> is performed in the query parser before calling getPrefixQuery.
> My test case, added at the beginning of 
> TestExtendedDismaxParser.testFocusQueryParser:
> {code}
>     assertQ("expected doc is missing (using escaped edismax w/field)",
>         req("q", "t_special:literal\\:\\u0063olo*n", 
>             "defType", "edismax"),
>         "//doc[1]/str[@name='id'][.='46']"); 
> {code}
> Note: That test case was only used to debug into WildcardQuery to see that 
> the Unicode escape was not processed correctly. It fails in all cases, but 
> that's because of how the field type is analyzed.
> Here is a Lucene-level test case that can also be debugged to see that 
> WildcardQuery is not processing the Unicode escape properly. I added it at 
> the start of TestMultiAnalyzer.testMultiAnalyzer:
> {code}
>     assertEquals("literal\\:\\u0063olo*n", 
> qp.parse("literal\\:\\u0063olo*n").toString());
> {code}
> Note: This case will always run correctly since it is only checking the input 
> pattern string for WildcardQuery and not how the de-escaping was performed 
> within WildcardQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to