[jira] Updated: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Wed, 26 Nov 2008 14:20:47 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1470:
----------------------------------

    Attachment: LUCENE-1470.patch

An updated version of the patch. The format of encoded terms is incompatible to 
the previous patch.

Changed:
- Each byte of the 8 byte longs is encoded as 1 char (see coments before) -> 
more compact
- each byte is shifted by 0x30 when converted to char (e.g. value 0x05 is saved 
as char 0x35)
- the lower precision terms are stored in a separate field (using the original 
fieldname+"#trie"). With that it is possible to sort the original field. The 
mentioned helper field is only index. The prefix for the lower precisions 
starts at char 0x20. By separating prefix and data ranges, later merging the 
separate and the original field is easily possible. Because of this lower 
precisions would be listed before higher precisions or the full precision in 
the ordered term list.
- fix a small issue with checking a term two times (between two ranges)
- added test for TrieRangeQuery with a full-bounded and half-open range.

I will say something about Mike's suggestions tomorrow, now it is to late!

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1470.patch, LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to