[ 
https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904651#action_12904651
 ] 

Robert Muir commented on LUCENE-2369:
-------------------------------------

bq. No tests with 100M documents yet, but 1½ hour for build and 1.5GB of RAM 
would be the expected requirement.

Toke, have you tried doing this 'build' at index time instead? I would 
recommend applying LUCENE-2551 and indexing with ICU Collation, strength=primary

Now that we can mostly do everything as bytes, I think this slow functionality 
to do collation/range query at 'runtime' might soon be on its way out of lucene 
(see patches on LUCENE-2514).

Instead, I think its better to encourage users to index their content 
accordingly for the use cases they need.


> Locale-based sort by field with low memory overhead
> ---------------------------------------------------
>
>                 Key: LUCENE-2369
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2369
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Toke Eskildsen
>            Priority: Minor
>
> The current implementation of locale-based sort in Lucene uses the FieldCache 
> which keeps all sort terms in memory. Beside the huge memory overhead, 
> searching requires comparison of terms with collator.compare every time, 
> making searches with millions of hits fairly expensive.
> This proposed alternative implementation is to create a packed list of 
> pre-sorted ordinals for the sort terms and a map from document-IDs to entries 
> in the sorted ordinals list. This results in very low memory overhead and 
> faster sorted searches, at the cost of increased startup-time. As the 
> ordinals can be resolved to terms after the sorting has been performed, this 
> approach supports fillFields=true.
> This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 
> which contain previous discussions on the subject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to