[
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494187#comment-13494187
]
Matthew Willson commented on LUCENE-2482:
-----------------------------------------
Hi all -- few quick questions if anyone is still watching this.
* Could this be used to achieve an impact ordered index, as in e.g. [1], where
documents in a given term's postings list are ordered by score contribution or
term frequency?
* Any caveats or things one should be aware of when it comes to index sorting
in combination with different index merge strategies, and some of the more
advanced stuff in Solr for managing distributed indexes?
* Anyone aware of any other work along the lines of early stopping / dynamic
pruning optimisations in Lucene? e.g. MaxScore from [1] (I think Xapian [2]
calls it 'operator decay') or accumulator pruning based algorithms from [1]
(perhaps in combination with impact ordering)? in particular is there anything
in Lucene 4's approach to scoring and indexing which would make these hard in
principle?
Any pointers gratefully received.
[1] Buettcher Clarke & Cormack "Implementing and Evaluating search engines" ch.
5 pp. 143-153
[2] http://xapian.org/docs/matcherdesign.html
> Index sorter
> ------------
>
> Key: LUCENE-2482
> URL: https://issues.apache.org/jira/browse/LUCENE-2482
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/other
> Affects Versions: 3.1, 4.0-ALPHA
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Fix For: 3.6
>
> Attachments: indexSorter.patch, LUCENE-2482-4.0.patch
>
>
> A tool to sort index according to a float document weight. Documents with
> high weight are given low document numbers, which means that they will be
> first evaluated. When using a strategy of "early termination" of queries (see
> TimeLimitedCollector) such sorting significantly improves the quality of
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as
> document weights - thus the ordering was limited by the limited resolution of
> norms. This is a pure Lucene version of the tool, and it uses arbitrary
> floats from a specified stored field).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]