David Smiley created SOLR-7326:
----------------------------------
Summary: Reduce hl.maxAnalyzedChars budget for multi-valued fields
in the default highlighter
Key: SOLR-7326
URL: https://issues.apache.org/jira/browse/SOLR-7326
Project: Solr
Issue Type: Improvement
Components: highlighter
Reporter: David Smiley
Assignee: David Smiley
in DefaultSolrHighlighter, the hl.maxAnalyzedChars figure is used to constrain
how much text is analyzed before the highlighter stops, in the interests of
performance. For a multi-valued field, it effectively treats each value anew,
no matter how much text it was previously analyzed for other values for the
same field for the current document. The PostingsHighlighter doesn't work this
way -- hl.maxAnalyzedChars is effectively the total budget for a field for a
document, no matter how many values there might be. It's not reset for each
value. I think this makes more sense. When we loop over the values, we should
subtract from hl.maxAnalyzedChars the length of the value just checked. The
motivation here is consistency with PostingsHighlighter, and to allow for
hl.maxAnalyzedChars to be pushed down to term vector uninversion, which
wouldn't be possible for multi-valued fields based on the current way this
parameter is used.
Interestingly, I noticed Solr's use of FastVectorHighlighter doesn't honor
hl.maxAnalyzedChars as the FVH doesn't have a knob for that. It does have
hl.phraseLimit which is a limit that could be used for a similar purpose,
albeit applied differently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]