Tim Allison created SOLR-11462:
----------------------------------
Summary: TokenizerChain's normalize() doesn't work
Key: SOLR-11462
URL: https://issues.apache.org/jira/browse/SOLR-11462
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Tim Allison
Priority: Trivial
TokenizerChain's {{normalize()}} is not currently used so this doesn't
currently have any negative effects on search. However, there is a bug, and we
should fix it.
If applied to a TokenizerChain with {{filters.length > 1}}, only the last would
apply.
{noformat}
@Override
protected TokenStream normalize(String fieldName, TokenStream in) {
TokenStream result = in;
for (TokenFilterFactory filter : filters) {
if (filter instanceof MultiTermAwareComponent) {
filter = (TokenFilterFactory) ((MultiTermAwareComponent)
filter).getMultiTermComponent();
result = filter.create(in);
}
}
return result;
}
{noformat}
The fix is trivial:
{noformat}
- result = filter.create(in);
+ result = filter.create(result);
{noformat}
If you'd like to swap out {{TextField#analyzeMultiTerm()}} with, say:
{noformat}
public static BytesRef analyzeMultiTerm(String field, String part, Analyzer
analyzerIn) {
if (part == null || analyzerIn == null) return null;
return analyzerIn.normalize(field, part);
}
{noformat}
I'm happy to submit a PR with unit tests. Let me know.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]