[ https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905020#action_12905020 ]
Toke Eskildsen commented on LUCENE-2369: ---------------------------------------- {quote} ICU keys are just byte[] just like regular terms. they are "regular terms" {quote} Do they or do they not need to be loaded into heap in order to be used for sorted search? {quote} Can we forget about the stupid runtime Locale sort, if you have a way to improve memory usage for byte[] terms, lets look just at that? Then this could be more general and more useful. {quote} Easy now. The whole runtime-vs-index-time issue is something that I don't care much about at this point. Pre-sorting can be done both at index and search time. Let's just say that we do it at index-time and go from there. Not holding the sort-terms in memory (whether they be Strings, BytesRefs, regular terms or ICU keys) and doing all possible sorting up front (in the case of a hybrid ICU-approach: A merge-sort of the already sorted segments), is what I'm looking at. Could you please re-read my comment with that in mind and see if my breakdown and trade-off lists makes sense? It seems to me that you're quite certain that there is something I've missed, but I haven't yet understood what it is. I do know that ICU keys are just regular terms in the technical sense. When I use the designation ICU keys, I do it to make it clear that we're getting locale-specific ordering. Deep breaths, ok? I'm going to fetch the kids from school, so you don't need to rush your answer. > Locale-based sort by field with low memory overhead > --------------------------------------------------- > > Key: LUCENE-2369 > URL: https://issues.apache.org/jira/browse/LUCENE-2369 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Toke Eskildsen > Priority: Minor > > The current implementation of locale-based sort in Lucene uses the FieldCache > which keeps all sort terms in memory. Beside the huge memory overhead, > searching requires comparison of terms with collator.compare every time, > making searches with millions of hits fairly expensive. > This proposed alternative implementation is to create a packed list of > pre-sorted ordinals for the sort terms and a map from document-IDs to entries > in the sorted ordinals list. This results in very low memory overhead and > faster sorted searches, at the cost of increased startup-time. As the > ordinals can be resolved to terms after the sorting has been performed, this > approach supports fillFields=true. > This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 > which contain previous discussions on the subject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org