[ https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668358#comment-15668358 ]
ASF subversion and git services commented on LUCENE-7526: --------------------------------------------------------- Commit 0790c34cc555ec9aaaab09cae04da6e61f465cfc in lucene-solr's branch refs/heads/branch_6x from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0790c34 ] LUCENE-7526: UnifiedHighlighter: enhance MTQ passage relevancy. TokenStreamFromTermVector isn't used by the UH anymore. Refactor AnalysisOffsetStrategy into TokenStream and MemoryIndex strategies, and related refactorings from that. (cherry picked from commit 7af454a) > Improvements to UnifiedHighlighter OffsetStrategies > --------------------------------------------------- > > Key: LUCENE-7526 > URL: https://issues.apache.org/jira/browse/LUCENE-7526 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: Timothy M. Rodriguez > Assignee: David Smiley > Priority: Minor > Fix For: 6.4 > > Attachments: LUCENE-7526.patch > > > This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies > by reducing reliance on creating or re-creating TokenStreams. > The primary changes are as follows: > * AnalysisOffsetStrategy - split into two offset strategies > ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a > MemoryIndex for producing Offsets > ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a > MemoryIndex. Can only be used if the query distills down to terms and > automata. > * TokenStream removal > ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill > the memory index and then once consumed a new one was generated by > uninverting the MemoryIndex back into a TokenStream if there were automata > (wildcard/mtq queries) involved. Now this is avoided, which should save > memory and avoid a second pass over the data. > ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid > generating a TokenStream if automata are involved. > ** PostingsWithTermVectorsOffsetStrategy - similar refactoring > * CompositePostingsEnum - aggregates several underlying PostingsEnums for > wildcard/mtq queries. This should improve relevancy by providing unified > metrics for a wildcard across all it's term matches > * Added a HighlightFlag for enabling the newly separated > TokenStreamOffsetStrategy since it can adversely affect passage relevancy -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org