[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280234#comment-15280234
 ] 

Michael McCandless commented on LUCENE-6766:
--------------------------------------------

I tried sorting with the 10M wikipedia index.

Sort by last-modified-date:

{noformat}
  Indexer: indexing done (900389 msec); total 10000000 docs
  Indexer: force merge done (took 134020 msec)
{noformat}
 
Sort by title:

{noformat}
  Indexer: indexing done (907923 msec); total 10000000 docs
  Indexer: force merge done (took 135041 msec)
{noformat}
 
vs. no sorting:

{noformat}
  Indexer: indexing done (702761 msec); total 10000000 docs
  Indexer: force merge done (took 65726 msec)
{noformat}
 
Index size was about the same in all cases, ~3.1 GB.

I also confirmed CheckIndex verifies the sorted indices are OK (it checks the 
sort order).

So ~28% slower with sorting overall... but this uses a single thread, 
SerialMergeScheduler, and small IW buffer, so it's very merge-heavy.


> Make index sorting a first-class citizen
> ----------------------------------------
>
>                 Key: LUCENE-6766
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6766
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to