[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Wed, 26 Nov 2008 09:07:46 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651061#action_12651061
 ]


Uwe Schindler commented on LUCENE-1470:
---------------------------------------

You are right, my intentation was to make it universal by casting everything to 
unsigned long. For the sorting problem:
With standard lucene, you can only sort when you have one term/field/doc. I am 
currently testing my code after modifying it to use a more compact version of 
the encoding:

a) I do not encode half bytes, I encode full bytes per char. I still have 8 
precisions after it, but code gets simplier and you have terms with length=8 
chrs after encoding. After reading docs of String.compareTo() again, I 
understood the "Java-Way" of binary sort order and verified it with my test 
cases.

b) the full precision is stored in the user-given document field name, all 7 
lower precisions are prefixed (as before) and put into a "helper" field with 
suffix "#trie" after the name. This field is only indexed, not stored or 
anything else. This field is only used in TrieRangeFilter for the trie 
algorithm.

For my project *panFMP* it is not such a big problem, if you inform the "few" 
users using it (I know all of them) and ask them to do a index rebuild (for 
that a tool is available). But the encoding format of this contrib package 
should be fixed and discussed before it is released!

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>         Attachments: LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to