[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Yonik Seeley (JIRA) Sat, 07 Feb 2009 10:45:28 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671504#action_12671504
 ]


Yonik Seeley commented on LUCENE-1470:
--------------------------------------

bq. for the beginners API there is missing the possibility to store the 
full-precision value

They could simply store it in a different field, in whatever format they 
desire, right?  It seems like TrieRange should be about range matching, not the 
format of stored fields.

bq. NumberUtils tries to get the most out of each char vs. this tries to not 
affect UTF-8 encoding and use ASCII only?

NumberUtils in Solr was developed a *long* time ago, before Parser support in 
the FieldCache, etc (Lucene 1.4).  I chose 14 bit numbers to minimize size in 
FieldCache using a StringIndex, and because I didn't understand Lucene prefix 
compression at the time :-)

If there are to be many in-memory representations, then using 14 bit chars 
might be better.  Otherwise it seems like 7 bit might be preferable (better 
prefix compression, more predictable branches in the UTF8 encoder/decoder).  Of 
course it's a trivial switch, so perhaps we should just try and benchmark it 
when everything else is done.

As for TrieRangeFilter, I guess the most generic constructor would look like:
{code}
TrieRangeFilter(int precisionStep, String[] fields, long lowerSortableBits, 
long upperSortableBits, boolean includeLower, boolean includeUpper)
{code}




> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: fixbuild-LUCENE-1470.patch, fixbuild-LUCENE-1470.patch, 
> LUCENE-1470-readme.patch, LUCENE-1470.patch, LUCENE-1470.patch, 
> LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, 
> LUCENE-1470.patch, TrieUtils.java
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to