RE: Help with Numeric Range

Todd Nine Wed, 23 Jun 2010 17:27:49 -0700

Hi Uwe,

  Thank you for your help, it is greatly appreciated.  Unfortunately, my
tests all fail except for RangeInclusive.  I've changed the step to be 6
as per your recommendation.  I had it at max to eliminate step precision
as the cause of the test failure.  Essentially, all keys in Cassandra
are UTF-8 Keys.  In the Lucandra, the keys are constructed in the
following way.


1. Get the token stream for the field.  In this case it's a
NumericTokenStream with (numeric,valSize=64,precisionStep=6)
2. For all tokens in the stream, create a UTF8 String in the following
format <fieldname>\uffff<token value>
3. Set the term frequency to 1

This gives us a list of tokens, prefixed with the field name and the
delimiter.  then we do this

for each term from above create a key of the format
<indexname>\uffff<fieldname>\uffff<token value> and write it to TermInfo
column Family

After debugging the implementation of the LucandraTermEnum, it is
correctly returning values that should match my numeric range query.
However, I never get the results in the TopDocs result set after they're
handed back to the numeric range query object.  Any ideas why this is
happening?

Thanks,
Todd




On Wed, 2010-06-23 at 08:53 +0200, Uwe Schindler wrote:

> Hi Todd,
> 
> I am not sure if I understand your problem correctly. I am not familiar with 
> Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and 
> IndexReader according to the documentation, numeric queries should work. A 
> NumericField internally creates a TokenStream and "analyzes" the number to 
> several Tokens, which are somehow "half binary" (they are terms containing of 
> characters in the full 0..127 range for optimal UTF8 compression with 3.x 
> versions of Lucene). The exact encoding can be looked at in the NumericUtils 
> class + javadocs.
> 
> About your testcase: The test looks good, so does it fail? If yes, where is 
> the problem? You can also look into Lucene's test TestNumericRangeQuery64 for 
> more examples. Or modify its @BeforeClass to instead build a Lucandra index. 
> 
> The test has one thing, that is not intended to be done like that:
> numeric = new NumericField("long", Integer.MAX_VALUE, Store.YES, true);
> 
> You are using MAX_VALUE as precision step, this would slowdown all queries to 
> the speed of old-style TermRangeQueries. It is always better to stick with 
> the default of 4, which creates 64 bits / 4 precStep = 16 terms per value. 
> Alternatively for longs, 6 is a good precision step (see NumericRangeQuery 
> documentation). MAX_VALUE is only intended for fields that do not do numeric 
> ranges but e.g. sort only. precisionStep is a performance tuning parameter, 
> it has nothing to do with better/worse precision on terms or different query 
> results. If you are using NumericRangeQuery with this large precStep, you are 
> not using the numeric features at all, so your test should not behave 
> different from a conventional TermRangeQuery with padded terms.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> 
> 
> > -----Original Message-----
> > From: Todd Nine [mailto:[email protected]]
> > Sent: Wednesday, June 23, 2010 7:53 AM
> > To: [email protected]
> > Subject: Help with Numeric Range
> > 
> > Hi all,
> >   I'm new to Lucene, as well as Cassandra.  I'm working on the Lucandra
> > project to modify it to add some extra functionality.  It hasn't been fully
> > testing with range queries, so I've created some tests and contributed them.
> > You can view my source here.
> > 
> > http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRang
> > eTests.java
> > 
> > First, is this a sensible test?  I'm specifically testing the case of longs 
> > where I
> > need millisecond precision on my searches.
> > 
> > 
> > Second, I see that Numeric Fields are built via terms.  I think the issue 
> > lies in
> > the encoding of these terms into bytes for the Cassandra keys.  Can anyone
> > point me to some documentation on numeric queries and terms, and how
> > they are encoded at the byte level based on the precision?
> > 
> > Thanks,
> > Todd
>

RE: Help with Numeric Range

Reply via email to