Hi Uwe,
Thank you for your help, it is greatly appreciated. Unfortunately, my
tests all fail except for RangeInclusive. I've changed the step to be 6
as per your recommendation. I had it at max to eliminate step precision
as the cause of the test failure. Essentially, all keys in Cassandra
are UTF-8 Keys. In the Lucandra, the keys are constructed in the
following way.
1. Get the token stream for the field. In this case it's a
NumericTokenStream with (numeric,valSize=64,precisionStep=6)
2. For all tokens in the stream, create a UTF8 String in the following
format <fieldname>\uffff<token value>
3. Set the term frequency to 1
This gives us a list of tokens, prefixed with the field name and the
delimiter. then we do this
for each term from above create a key of the format
<indexname>\uffff<fieldname>\uffff<token value> and write it to TermInfo
column Family
After debugging the implementation of the LucandraTermEnum, it is
correctly returning values that should match my numeric range query.
However, I never get the results in the TopDocs result set after they're
handed back to the numeric range query object. Any ideas why this is
happening?
Thanks,
Todd
On Wed, 2010-06-23 at 08:53 +0200, Uwe Schindler wrote:
> Hi Todd,
>
> I am not sure if I understand your problem correctly. I am not familiar with
> Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and
> IndexReader according to the documentation, numeric queries should work. A
> NumericField internally creates a TokenStream and "analyzes" the number to
> several Tokens, which are somehow "half binary" (they are terms containing of
> characters in the full 0..127 range for optimal UTF8 compression with 3.x
> versions of Lucene). The exact encoding can be looked at in the NumericUtils
> class + javadocs.
>
> About your testcase: The test looks good, so does it fail? If yes, where is
> the problem? You can also look into Lucene's test TestNumericRangeQuery64 for
> more examples. Or modify its @BeforeClass to instead build a Lucandra index.
>
> The test has one thing, that is not intended to be done like that:
> numeric = new NumericField("long", Integer.MAX_VALUE, Store.YES, true);
>
> You are using MAX_VALUE as precision step, this would slowdown all queries to
> the speed of old-style TermRangeQueries. It is always better to stick with
> the default of 4, which creates 64 bits / 4 precStep = 16 terms per value.
> Alternatively for longs, 6 is a good precision step (see NumericRangeQuery
> documentation). MAX_VALUE is only intended for fields that do not do numeric
> ranges but e.g. sort only. precisionStep is a performance tuning parameter,
> it has nothing to do with better/worse precision on terms or different query
> results. If you are using NumericRangeQuery with this large precStep, you are
> not using the numeric features at all, so your test should not behave
> different from a conventional TermRangeQuery with padded terms.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
> > -----Original Message-----
> > From: Todd Nine [mailto:[email protected]]
> > Sent: Wednesday, June 23, 2010 7:53 AM
> > To: [email protected]
> > Subject: Help with Numeric Range
> >
> > Hi all,
> > I'm new to Lucene, as well as Cassandra. I'm working on the Lucandra
> > project to modify it to add some extra functionality. It hasn't been fully
> > testing with range queries, so I've created some tests and contributed them.
> > You can view my source here.
> >
> > http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRang
> > eTests.java
> >
> > First, is this a sensible test? I'm specifically testing the case of longs
> > where I
> > need millisecond precision on my searches.
> >
> >
> > Second, I see that Numeric Fields are built via terms. I think the issue
> > lies in
> > the encoding of these terms into bytes for the Cassandra keys. Can anyone
> > point me to some documentation on numeric queries and terms, and how
> > they are encoded at the byte level based on the precision?
> >
> > Thanks,
> > Todd
>