Tim Allison created SOLR-11462: ---------------------------------- Summary: TokenizerChain's normalize() doesn't work Key: SOLR-11462 URL: https://issues.apache.org/jira/browse/SOLR-11462 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Tim Allison Priority: Trivial
TokenizerChain's {{normalize()}} is not currently used so this doesn't currently have any negative effects on search. However, there is a bug, and we should fix it. If applied to a TokenizerChain with {{filters.length > 1}}, only the last would apply. {noformat} @Override protected TokenStream normalize(String fieldName, TokenStream in) { TokenStream result = in; for (TokenFilterFactory filter : filters) { if (filter instanceof MultiTermAwareComponent) { filter = (TokenFilterFactory) ((MultiTermAwareComponent) filter).getMultiTermComponent(); result = filter.create(in); } } return result; } {noformat} The fix is trivial: {noformat} - result = filter.create(in); + result = filter.create(result); {noformat} If you'd like to swap out {{TextField#analyzeMultiTerm()}} with, say: {noformat} public static BytesRef analyzeMultiTerm(String field, String part, Analyzer analyzerIn) { if (part == null || analyzerIn == null) return null; return analyzerIn.normalize(field, part); } {noformat} I'm happy to submit a PR with unit tests. Let me know. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org