[
https://issues.apache.org/jira/browse/LUCENE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066628#comment-13066628
]
Paul Elschot commented on LUCENE-3325:
--------------------------------------
This was more or less suggested in:
"Compressing Term Positions in Web Indexes", Hao Yan, Shuan Ding, Torsten Suel,
SIGIR '09.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.152.4748&rep=rep1&type=pdf
in sections 7 and 8, and especially the last sentence: "... one could even
consider storing the parsed documents themselves in highly compressed form and
accessing these during a position data lookup, instead of keeping the positions
in inverted lists."
> Transpose positions in index
> ----------------------------
>
> Key: LUCENE-3325
> URL: https://issues.apache.org/jira/browse/LUCENE-3325
> Project: Lucene - Java
> Issue Type: Wish
> Components: core/index
> Reporter: Paul Elschot
> Priority: Minor
>
> When positions are used in queries with many terms, each term in each
> document causes a seek in the positions, and in large indexes these seeks can
> be far apart even when the terms are in the same document.
> The number of (disk) cache misses of such position seeks might be reduced by
> putting the positions for all terms in the same document directly behind each
> other. This should have a noticable effect when terms are alphabetically
> close, for example for truncations, and it should also help when the
> documents have few enough positions to fill a cache entry (disk page, cache
> line).
> This might also help the performance of highlighting based on indexed
> positions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]