Actually, the purpose of StringIndex is to reduce "sort by string" to "sort by int", for exactly the reason you said (compareTo is costly for String).

Ie StringIndex computes the ordinal for every doc in the index, so sorting by string value reduces to sorting int ordinals.

I think it's the same thing that DisjointMultiFilter is doing? Both StringIndex and DisjointMultiFilter map Term Text (String) -> ord (int) as well as docID -> ord.

I do like the idea of using N-bit packing for the docID -> ord map.

Mike

Jason Rutherglen wrote:

The problem with StringIndex is that it uses strings which are costly for compareTo juxtaposed with numeric compare (used juxtaposed because originally had "compared with" which was redundant). It seems helpful to have generic primitive based StringIndex classes.

On Fri, Nov 21, 2008 at 2:34 AM, Michael McCandless (JIRA) <[EMAIL PROTECTED] > wrote:

[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649639 #action_12649639 ]

Michael McCandless commented on LUCENE-1461:
--------------------------------------------

It seems like the core class here (DisjointMultiFilter) is doing the same thing as FieldCache's StringIndex? Ie, it builds a data structure that maps String <-> ord and docID -> ord. So maybe we can merge DisjointMultiFilter into the FieldCache API.

And then RangeMultiFilter is a great addition for quickly "spawning" numerous new RangeFilters, having pulled & stored the StringIndex from the FieldCache? So I think it should live in core org.apache.lucene.search.*? I'd prefer a different name (RangeMultiFilter implies it can filter over multiple ranges) but can't think of one. Or maybe we absorb it into RangeFilter, as a different "rewrite" method like "useFieldCache=true|false"?

> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a single term. They do this by building an integer array of term numbers (storing the term->number mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator. > This code is currently being used to do age range filtering, but could also be used to do other date filtering or in any application where there need to be multiple filters based on the same single term field. I have an untested implementation of single term filtering and have considered but not yet implemented term set filtering (useful for location based searches) as well. > The code here is fairly rough; it works but lacks javadocs and toString() and hashCode() methods etc. I'm posting it here to discover if there is other interest in this feature; I don't mind fixing it up but would hate to go to the effort if it's not going to make it into Lucene.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to