[
https://issues.apache.org/jira/browse/LUCENE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420826#comment-13420826
]
Uwe Schindler commented on LUCENE-4247:
---------------------------------------
Just to add some background information:
For the Solr Queryparser (see SOLR-2921) there is a new marker "MultiTermAware"
in Solr. The Solr QueryParser can handle that, but lack of an IndexSchema,
Lucene's cannot, so it does not analyze all MultiTermQueries like WildCard,
Prefix, Fuzzy, or TermRangeQueries.
Maybe we port over the whole analysis factory infrastructure to Lucene, then
this might be fixed, but that is not possible at the moment with what's
available in Lucene.
> QueryParser doesn't call Analyzer
> ---------------------------------
>
> Key: LUCENE-4247
> URL: https://issues.apache.org/jira/browse/LUCENE-4247
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/queryparser
> Affects Versions: 3.6
> Reporter: Zied Hamdi
> Assignee: Uwe Schindler
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> I'm trying to escape czech characters thorough the ASCIIFoldingFilter this
> works fine in indexing since I can retrieve the non-diacritic version of the
> content I indexed. But trying to retrieve with diacritics returns always 0
> results
> In debug mode I can clearly see that the Analyzer wasn't called (in addition
> to that I've put a breakpoint in my analyser to check if it is not called
> later, and it never passes in)
> searchText = "příLIš*";
> Analyzer analyzer = (Analyzer) factory.getBean("analyzer");
> Query q = new QueryParser((Version) factory.getBean("version"),
> DestinationPlaceProperties.NAME, analyzer).parse(searchText);
> The query q has these values in debug:
> prefix Term (id=90)
> field "name" (id=100)
> text "příliš" (id=101)
> --- more details ----
> q PrefixQuery (id=65)
> boost 1.0
> numberOfTerms 0
> prefix Term (id=90)
> rewriteMethod MultiTermQuery$2 (id=92)
> ---------------------
> My analyser is quite simple: I put its code just for reference
> public class DestinationAnalyser extends Analyzer {
> /**
> *
> */
> private final Version luceneVersion;
> public DestinationAnalyser(Version lucene_version) {
> super();
> this.luceneVersion = lucene_version;
> }
> /*
> * (non-Javadoc)
> *
> * @see
> org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
> * java.io.Reader)
> */
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
> TokenStream result = new StandardTokenizer(luceneVersion,
> reader);
> result = new StandardFilter(luceneVersion, result);
> result = new LowerCaseFilter(luceneVersion, result);
> result = new ASCIIFoldingFilter(result);
> return result;
> }
> }
> --------- WORKAROUND ---------
> To avoid the problem, I'm actually using this method to transform the search
> text
> /**
> * Uses {@link ASCIIFoldingFilter} to transform diacritical text to its
> ascii
> * counterpart
> *
> * @param text
> * to transform
> * @return ascii text
> */
> public static String foldToASCII(String text) {
> int length = text.length();
> char[] toReturn = new char[length];
> ASCIIFoldingFilter.foldToASCII(text.toCharArray(), 0, toReturn,
> 0, length);
> return new String(toReturn);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]