[
https://issues.apache.org/jira/browse/LUCENE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476044#comment-13476044
]
Mark Harwood commented on LUCENE-3772:
--------------------------------------
For bigger-than-memory docs is it not possible to use nested documents to
represent subsections (e.g. a child doc for each of the chapters in a book) and
then use BlockJoinQuery to select the best child docs?
Highlighting can then be used on a more-manageable subset of the original
content and Lucene's ranking algos are being used to select the best "fragment"
rather than the highlighter's own attempts to reproduce this logic.
Obviously depends on the shape of your content/queries but books-and-chapters
is probably a good fit for this approach.
> Highlighter needs the whole text in memory to work
> --------------------------------------------------
>
> Key: LUCENE-3772
> URL: https://issues.apache.org/jira/browse/LUCENE-3772
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Affects Versions: 3.5
> Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
> Reporter: Luis Filipe Nassif
> Labels: highlighter, improvement, memory
>
> Highlighter methods getBestFragment(s) and getBestTextFragments only accept a
> String object representing the whole text to highlight. When dealing with
> very large docs simultaneously, it can lead to heap consumption problems. It
> would be better if the API could accept a Reader objetct additionally, like
> Lucene Document Fields do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]