The only thing that is a bit different is we encode (bases62) the numbers of
xxxx's in the last digit mainly so the terms are smaller in length.
my @foo = encode_trie(100000);
print Dumper(\@foo);
The output would look like this:
$VAR1 = [
'1a', ## 1xxxxxxxxxx
'129', ## 12xxxxxxxxx
'1208',
'12007',
'120026',
'1200205',
'12002014',
'120020113',
'1200201122',
'12002011201',
'120020112010' ## the exact match for base 3 @ 100000
];
So you really only use encode_trie(int) to build the terms to index and
query_trie( minint, maxint ) for search terms at query time.
few things i'm pretty sure need some love are:
1. encode() and qery_trie() are hard coded for base3.
2. If the length if your trie gets longer than 62 chars the cute disk saving
trick above will surely not work.
enjoy,
-Dan
On Jun 22, 2011, at 7:19 PM, Peter Karman wrote:
> Marvin Humphrey wrote on 6/22/11 8:51 PM:
>>> On Tue, Jun 21, 2011 at 12:42:43AM -0500, Peter Karman wrote:
>>>> I want to override the behavior of the RangeQuery class to support my
>>>> pseudo
>>>> multi-value fields, which I achieve by concatenating values with the \x03
>>>> byte.
>>
>> OK, there's another option which has suddenly become more attractive. :) My
>> Eventful colleague Dan Markham has submitted a trie implementation that can
>> be
>> used for generating numeric ranges:
>>
>> https://issues.apache.org/jira/browse/LUCY-159
>>
>> It is to some degree based on the algorithm used by Lucene's
>> NumericRangeQuery:
>>
>> http://s.apache.org/QOx
>>
>
> Thanks to both you and Dan for this contribution!
>
> I'll have a look at the code and the docs and see if it feels workable for my
> particular need. In any case, I think it's great to see contributions like
> these, expanding the Lucy ecosystem.
>
>
> --
> Peter Karman . http://peknet.com/ . [email protected]