Actually, the purpose of StringIndex is to reduce "sort by string" to
"sort by int", for exactly the reason you said (compareTo is costly
for String).
Ie StringIndex computes the ordinal for every doc in the index, so
sorting by string value reduces to sorting int ordinals.
I think it's the same thing that DisjointMultiFilter is doing? Both
StringIndex and DisjointMultiFilter map Term Text (String) -> ord
(int) as well as docID -> ord.
I do like the idea of using N-bit packing for the docID -> ord map.
Mike
Jason Rutherglen wrote:
The problem with StringIndex is that it uses strings which are
costly for compareTo juxtaposed with numeric compare (used
juxtaposed because originally had "compared with" which was
redundant). It seems helpful to have generic primitive based
StringIndex classes.
On Fri, Nov 21, 2008 at 2:34 AM, Michael McCandless (JIRA) <[EMAIL PROTECTED]
> wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649639
#action_12649639 ]
Michael McCandless commented on LUCENE-1461:
--------------------------------------------
It seems like the core class here (DisjointMultiFilter) is doing the
same thing as FieldCache's StringIndex? Ie, it builds a data
structure that maps String <-> ord and docID -> ord. So maybe we
can merge DisjointMultiFilter into the FieldCache API.
And then RangeMultiFilter is a great addition for quickly "spawning"
numerous new RangeFilters, having pulled & stored the StringIndex
from the FieldCache? So I think it should live in core
org.apache.lucene.search.*? I'd prefer a different name
(RangeMultiFilter implies it can filter over multiple ranges) but
can't think of one. Or maybe we absorb it into RangeFilter, as a
different "rewrite" method like "useFieldCache=true|false"?
> Cached filter for a single term field
> -------------------------------------
>
> Key: LUCENE-1461
> URL: https://issues.apache.org/jira/browse/LUCENE-1461
> Project: Lucene - Java
> Issue Type: New Feature
> Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java,
RangeMultiFilter.java, TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field
containing a single term. They do this by building an integer array
of term numbers (storing the term->number mapping in a TreeMap) and
then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but
could also be used to do other date filtering or in any application
where there need to be multiple filters based on the same single
term field. I have an untested implementation of single term
filtering and have considered but not yet implemented term set
filtering (useful for location based searches) as well.
> The code here is fairly rough; it works but lacks javadocs and
toString() and hashCode() methods etc. I'm posting it here to
discover if there is other interest in this feature; I don't mind
fixing it up but would hate to go to the effort if it's not going to
make it into Lucene.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]