[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274734#comment-15274734
 ] 

Michael McCandless commented on LUCENE-6766:
--------------------------------------------

I've been slowly iterating here and pushing changes to 
https://github.com/mikemccand/lucene-solr/tree/index_sort

There are tons of nocommits, but tests do pass, including index sorting tests 
(though they still need improving).

Some details:

  - I added a new {{DocIDMerger}} helper class, and the default merge impls use 
this to abstract away how to iterate the documents from the N sub-readers, 
whether they are simply concatenated or merge-sorted.  I think this should be 
quite a bit more efficient than what {{SortingMergePolicy}} does today, but it 
does add some increase in code complexity, which I think is OK/contained.

  - {{SlowCompositeReader}} is no longer used for index sorting

  - Points now work fine w/ index sorting

  - CheckIndex verifies the claimed per-segment index sort is in fact true

  - IW gets angry if you open an existing index with a different index sort

  - Only simple sort types are allowed; no CUSTOM, SCORE or REWRITEABLE

  - I made a new {{Lucene62Codec}}, with a new {{Lucene62SegmentInfoFormat}} 
that supports index sorting.

  - I added {{LeafReader.getIndexSort}} so apps can check if a given segment 
was sorted

  - I disable bulk merge optos when index sorting is present

IW flush still does not sort, and so at merge time we wrap such segments with 
{{SortingLeafReader}}.  This is quite ugly, that an index can have some 
segments sorted and some not sorted.  E.g. it means IW's check for whether the 
new index sort matches the existing one, is just best effort ... but this is 
already an enormous change so
I think we really have to look into "sort on flush" (which is hairy by itself) 
later, separately


> Make index sorting a first-class citizen
> ----------------------------------------
>
>                 Key: LUCENE-6766
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6766
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to