[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

ASF GitHub Bot (JIRA) Sat, 29 Oct 2016 08:47:08 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15618320#comment-15618320
 ]


ASF GitHub Bot commented on LUCENE-7526:
----------------------------------------

Github user Timothy055 commented on the issue:

    https://github.com/apache/lucene-solr/pull/105
  
    I don't think there's a way to avoid keeping the position state, 
unfortunately.  The reason is that we can move one of the postings enums to the 
next position, but then realize the next position for that term is behind the 
position for a different term (and postings enum) that also matches the 
wildcard.  Then we'll update the top and switch to the next postings enum (by 
offset now), but once it's exhausted or we switch back to the previous one from 
interleaving the position is lost.  :/  An alternative to avoid this would be 
to change PostingsEnum to allow fetching of the currentPosition, then nearly 
all the house keeping would go away.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---------------------------------------------------
>
>                 Key: LUCENE-7526
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7526
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Timothy M. Rodriguez
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

Reply via email to