[jira] Commented: (LUCENE-2369) Locale-based sort by field with low memory overhead

Toke Eskildsen (JIRA) Wed, 01 Sep 2010 06:34:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905020#action_12905020
 ]


Toke Eskildsen commented on LUCENE-2369:
----------------------------------------

{quote}
ICU keys are just byte[] just like regular terms. they are "regular terms"
{quote}

Do they or do they not need to be loaded into heap in order to be used for 
sorted search?

{quote}
Can we forget about the stupid runtime Locale sort, if you have a way to 
improve memory usage for byte[] terms, lets look just at that? Then this could 
be more general and more useful.
{quote}

Easy now. The whole runtime-vs-index-time issue is something that I don't care 
much about at this point. Pre-sorting can be done both at index and search 
time. Let's just say that we do it at index-time and go from there.

Not holding the sort-terms in memory (whether they be Strings, BytesRefs, 
regular terms or ICU keys) and doing all possible sorting up front (in the case 
of a hybrid ICU-approach: A merge-sort of the already sorted segments), is what 
I'm looking at. Could you please re-read my comment with that in mind and see 
if my breakdown and trade-off lists makes sense? It seems to me that you're 
quite certain that there is something I've missed, but I haven't yet understood 
what it is. I do know that ICU keys are just regular terms in the technical 
sense. When I use the designation ICU keys, I do it to make it clear that we're 
getting locale-specific ordering.

Deep breaths, ok? I'm going to fetch the kids from school, so you don't need to 
rush your answer.

> Locale-based sort by field with low memory overhead
> ---------------------------------------------------
>
>                 Key: LUCENE-2369
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2369
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Toke Eskildsen
>            Priority: Minor
>
> The current implementation of locale-based sort in Lucene uses the FieldCache 
> which keeps all sort terms in memory. Beside the huge memory overhead, 
> searching requires comparison of terms with collator.compare every time, 
> making searches with millions of hits fairly expensive.
> This proposed alternative implementation is to create a packed list of 
> pre-sorted ordinals for the sort terms and a map from document-IDs to entries 
> in the sorted ordinals list. This results in very low memory overhead and 
> faster sorted searches, at the cost of increased startup-time. As the 
> ordinals can be resolved to terms after the sorting has been performed, this 
> approach supports fillFields=true.
> This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 
> which contain previous discussions on the subject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2369) Locale-based sort by field with low memory overhead

Reply via email to