[ https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853884#action_12853884 ]
Toke Eskildsen commented on LUCENE-2369: ---------------------------------------- The current implementation accepts Comparator<Object> (which must accept Strings) as well as a Locale (which is converted to Collator.getInstance(locale) under the hoo)d as arguments. Plugging in the ICU collator directly should be trivial. If/when it gets possible to use byte[] for sorters in general, I'll add support for that. Indexing ICU collator keys and using them in combination with LUCENE-2369 is an interesting idea, as it would speed up the building process quite a lot, while keeping the memory usage down. As long as fillFields=false, the two methods are independent as should work well with each other. Fairly easy to try. For fillFields=true, it gets a bit trickier and requires a special FieldComparatorSource that keeps two maps from docID: One to the ICU collator key, one to the original term. Still, it should not be that hard to implement and I'll be happy to do it if the fillFields=false-case turns out to work well. > Locale-based sort by field with low memory overhead > --------------------------------------------------- > > Key: LUCENE-2369 > URL: https://issues.apache.org/jira/browse/LUCENE-2369 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Toke Eskildsen > Priority: Minor > > The current implementation of locale-based sort in Lucene uses the FieldCache > which keeps all sort terms in memory. Beside the huge memory overhead, > searching requires comparison of terms with collator.compare every time, > making searches with millions of hits fairly expensive. > This proposed alternative implementation is to create a packed list of > pre-sorted ordinals for the sort terms and a map from document-IDs to entries > in the sorted ordinals list. This results in very low memory overhead and > faster sorted searches, at the cost of increased startup-time. As the > ordinals can be resolved to terms after the sorting has been performed, this > approach supports fillFields=true. > This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 > which contain previous discussions on the subject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org