[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Thu, 27 Nov 2008 00:40:48 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651283#action_12651283
 ]


Uwe Schindler commented on LUCENE-1470:
---------------------------------------

I think, this would easyly be possible. Refactoring some code parts, special 
implementations of the encoder and incrementTrieCoded/decrementTrieCoded would 
be possible.
A possibility to make this configuraeable would be to use an instance of 
TrieUtils, that takes the trie factor as constructor argument. The user would 
then encode/decode all his values using the instance. TrieRangeFilter would 
also get an instance of the encoder and use it to calculate the prefix terms 
and so on.

bq. The number of characters in the lower precision terms is not really 
relevant in the term index, because terms are indexed with common prefixes. 
Therefore in these cases one could just use a character to encode the 4 bits or 
2 bits.
bq. So the question is would it be possible to specify the trie factor when 
building and using the index?

Yes thats good for jumping to the correct term to start in the range query. The 
problem with shorter trie factors would be, that for each precision (e.g. 4 
bits, 2 bits) you need one full char in the encoded variant. As length is not a 
problem for terms, I think the common prefixed cannot be used so effective (a 
lot of terms with two low-cardinality chars at the beginning).

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1470.patch, LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to