[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Wed, 26 Nov 2008 07:29:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651035#action_12651035
 ]


Uwe Schindler commented on LUCENE-1470:
---------------------------------------

No problem, I have choosen the two chars / byte approach for two reasons:

a) it is simplier to generate lower precision terms (for 1-8 bytes precision) 
you just have to remove 2 chars per byte. in base 2^15, you only have 4 
precisions and some more bits. The number of terms in my special 
TrieRangeFilter per precision would not be 256 values, it would be 32768, 
making it slower. Another posibility would be to cast each byte to one char 
(+offset), but see the next point:
b) Java sometimes has strange string comparisons, and I did not want to walk 
into incompatiblities with String.compareTo()

But one other point you noted in the other JIRA issue is hurting me: I did not 
try to sort the results to my combined, prefixed field since long time, and you 
are right, it is not possible, if all different precisions are in the same 
field. A earlier version of TrieRangeQuery used a suffix after the field-name, 
automatically added by the TrieUtils.addXXXXTrieDocumentField. I reverted 
removed this about two years ago, but never tried to sort a numerical field 
since then. I think, I have to write a bug report for my own project :) The 
question is, if this is included into contrib and I put a dependency in my 
project to this contrib pacakge, the index format of panFMP may change, which 
is not good.

Just one question to other developers: Why is it not possible, to sort by a 
field with more than one term/doc, if you would restrict this to only use the 
*first* added term to the document as sort key in FieldCache?

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>         Attachments: LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to