[ https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905026#action_12905026 ]
Robert Muir commented on LUCENE-2369: ------------------------------------- bq. Do they or do they not need to be loaded into heap in order to be used for sorted search? They are just regular terms! you can do a TermQuery on them, sort them as byte[], etc. its just the bytes use 'collation encoding' instead of 'utf-8 encoding'. This is why i want to factor out the whole 'locale' thing from the issue, since sorting is agnostic to whats in the byte[], its unrelated and it would simplify the issue to just discuss that. bq. Easy now. The whole runtime-vs-index-time issue is something that I don't care much about at this point. Pre-sorting can be done both at index and search time. Let's just say that we do it at index-time and go from there. Well, the thing is, its something i care a lot about. The problems are: * Users who develop localized applications tend to use methods with Locale/Collator parameters if they are available: its best practice. * In the case of lucene, it is not best practice, but a silly trap (as you get horrible performance). * However, users are used to the concept of collation keys wrt indexing (e.g. when building a database index) * The apis here are wrong anyway: it shouldnt take Locale but Collator. There is no way to set strength or any other options, and theres no way to supply a Collator i made myself (e.g. from RuleBasedCollator) > Locale-based sort by field with low memory overhead > --------------------------------------------------- > > Key: LUCENE-2369 > URL: https://issues.apache.org/jira/browse/LUCENE-2369 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Toke Eskildsen > Priority: Minor > > The current implementation of locale-based sort in Lucene uses the FieldCache > which keeps all sort terms in memory. Beside the huge memory overhead, > searching requires comparison of terms with collator.compare every time, > making searches with millions of hits fairly expensive. > This proposed alternative implementation is to create a packed list of > pre-sorted ordinals for the sort terms and a map from document-IDs to entries > in the sorted ordinals list. This results in very low memory overhead and > faster sorted searches, at the cost of increased startup-time. As the > ordinals can be resolved to terms after the sorting has been performed, this > approach supports fillFields=true. > This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 > which contain previous discussions on the subject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org