Tim Allison created SOLR-11462:
----------------------------------

             Summary: TokenizerChain's normalize() doesn't work
                 Key: SOLR-11462
                 URL: https://issues.apache.org/jira/browse/SOLR-11462
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Tim Allison
            Priority: Trivial


TokenizerChain's {{normalize()}} is not currently used so this doesn't 
currently have any negative effects on search.  However, there is a bug, and we 
should fix it.

If applied to a TokenizerChain with {{filters.length > 1}}, only the last would 
apply. 
 
{noformat}
 @Override
  protected TokenStream normalize(String fieldName, TokenStream in) {
    TokenStream result = in;
    for (TokenFilterFactory filter : filters) {
      if (filter instanceof MultiTermAwareComponent) {
        filter = (TokenFilterFactory) ((MultiTermAwareComponent) 
filter).getMultiTermComponent();
        result = filter.create(in);
      }
    }
    return result;
  }
{noformat}

The fix is trivial:
{noformat}
-        result = filter.create(in);
+        result = filter.create(result);
{noformat}

If you'd like to swap out {{TextField#analyzeMultiTerm()}} with, say:

{noformat}
  public static BytesRef analyzeMultiTerm(String field, String part, Analyzer 
analyzerIn) {
    if (part == null || analyzerIn == null) return null;
    return analyzerIn.normalize(field, part);
  }
{noformat}

I'm happy to submit a PR with unit tests.  Let me know.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to