[ 
https://issues.apache.org/jira/browse/SOLR-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-6692:
-------------------------------
    Description: 
in DefaultSolrHighlighter, the hl.maxAnalyzedChars figure is used to constrain 
how much text is analyzed before the highlighter stops, in the interests of 
performance.  For a multi-valued field, it effectively treats each value anew, 
no matter how much text it was previously analyzed for other values for the 
same field for the current document. The PostingsHighlighter doesn't work this 
way -- hl.maxAnalyzedChars is effectively the total budget for a field for a 
document, no matter how many values there might be.  It's not reset for each 
value.  I think this makes more sense.  When we loop over the values, we should 
subtract from hl.maxAnalyzedChars the length of the value just checked.  The 
motivation here is consistency with PostingsHighlighter, and to allow for 
hl.maxAnalyzedChars to be pushed down to term vector uninversion, which 
wouldn't be possible for multi-valued fields based on the current way this 
parameter is used.

Interestingly, I noticed Solr's use of FastVectorHighlighter doesn't honor 
hl.maxAnalyzedChars as the FVH doesn't have a knob for that.  It does have 
hl.phraseLimit which is a limit that could be used for a similar purpose, 
albeit applied differently.

Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit 
early from it's field value loop if it reaches hl.snippets, and if 
hl.preserveMulti=true

  was:
I think hl.maxAnalyzedChars should apply cumulatively across the values of a 
multi-valued field.  DefaultSolrHighligher doesn't; I'm not sure yet about the 
other two.

Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit 
early from it's field value loop if it reaches hl.snippets.


> hl.maxAnalyzedChars should apply cumulatively on a multi-valued field
> ---------------------------------------------------------------------
>
>                 Key: SOLR-6692
>                 URL: https://issues.apache.org/jira/browse/SOLR-6692
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>            Reporter: David Smiley
>             Fix For: 5.0
>
>
> in DefaultSolrHighlighter, the hl.maxAnalyzedChars figure is used to 
> constrain how much text is analyzed before the highlighter stops, in the 
> interests of performance.  For a multi-valued field, it effectively treats 
> each value anew, no matter how much text it was previously analyzed for other 
> values for the same field for the current document. The PostingsHighlighter 
> doesn't work this way -- hl.maxAnalyzedChars is effectively the total budget 
> for a field for a document, no matter how many values there might be.  It's 
> not reset for each value.  I think this makes more sense.  When we loop over 
> the values, we should subtract from hl.maxAnalyzedChars the length of the 
> value just checked.  The motivation here is consistency with 
> PostingsHighlighter, and to allow for hl.maxAnalyzedChars to be pushed down 
> to term vector uninversion, which wouldn't be possible for multi-valued 
> fields based on the current way this parameter is used.
> Interestingly, I noticed Solr's use of FastVectorHighlighter doesn't honor 
> hl.maxAnalyzedChars as the FVH doesn't have a knob for that.  It does have 
> hl.phraseLimit which is a limit that could be used for a similar purpose, 
> albeit applied differently.
> Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit 
> early from it's field value loop if it reaches hl.snippets, and if 
> hl.preserveMulti=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to