John Song <[EMAIL PROTECTED]> wrote on 17/01/2007 11:09:40:

> ultimately, everything is text search.  For decimal number, what you
> do is to write a customized analyzer which multiple the number by
> some factor, round it to a long and then use NumberTools to convert
> that into a text string.  Here is what I did for latitude/longitude
> search: multiple it by 10e6.
>

For most cases this would be sufficient.

Won't work if multiplying by precision takes beyond Long.MAX.
I think an alternative that would work in that case too could
be to prefix X with P, where:
X == original number.
K == number of digits left side of decimal point.
P == lexicographically valid representation of K.

P is defined as "n0"+D+"n", where D is a one char
representation of K, that is: '0', .. '9','a'..'z'.

Examples for values of K, prefix P, and result string S:
X=.17   K=0 P=n00n S=n00n_.17
X=0.170 K=0 P=n00n S=n00n_.17  (no leading/trailing redundant zeros)
X=5.8   K=1 P=n01n S=n01n_5.8
X=17.66 K=2 P=n02n S=n02n_17.66
...
X=123456789.22    K=9  P=n09n  S=n09n_123456789.22
X=1234567890.22   K=10 P=n0an  S=n0an_1234567890.22
X=12345678901.22  K=11 P=n0bn  S=n0bn_12345678901.22
X=123456789012.22 K=12 P=n0cn  S=n0cn_123456789012.22

You get the idea - the prefix prevents the sorting error.
No special care required for the fraction part.

Negative values can work using 'm' instead of 'n' in the
prefix, e.g.:
X=-123456789.22    K=9  P=n09n  S=m09m_123456789.22

This should work for long values as well.
I like the fixed length prefix.
But code for this would need to be written...

> john
>
> ----- Original Message ----
> From: Jiho Han <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Wednesday, January 17, 2007 10:13:47 AM
> Subject: Searching/indexing date/time values or numeric values?
>
> Is there a way to index/search so that a query could be written to
> search on a field using arithmetic comparison operators?
>
> What I mean is if I had a date/time field called CREATEDATE, I would
> search for all documents where:
>
> CREATEDATE > "1/1/2007"
>
> The above is obvisouly pseudo-query expression.  I did find something
> called Range searches on the query syntax documentation page and it says
> the sorting is done lexicographically.  I guess that means it's sorted
> by letter.  I would then need to store all my date/time values in a
> format like yyyymmdd hh:mm:ss.
> And search, CREATEDATE:[20070101 00:00:00 TO 20070118 00:00:00], where
> the second date/time value is something like midnight tonight.
>
> But what about a decimal value?  If I have a VERSION field where values
> are like 1.0, 2.5, 11.3, etc.  That wouldn't work.  Because the values
> would be sorted:
> 1.0
> 11.3
> 2.5
> In that order.  And if I do VERSION:[1.0 TO 3.0], search would return
> all 3 of them.  The only workaround seems to be prepending 0's and that
> would also only work as long as the maximum digits for the interger part
> is known ahead of time.
>
> Can someone verify/suggest ways to make this work?
> Thanks
>
> Jiho Han
> Senior Software Engineer
> Infinity Info Systems
> The Sales Technology Experts
> Tel: 212.563.4400 x6375
> Fax: 212.760.0540
> [EMAIL PROTECTED]
> www.infinityinfo.com <http://www.infinityinfo.com/>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
>
>
____________________________________________________________________________________

> Cheap talk?
> Check out Yahoo! Messenger's low PC-to-Phone call rates.
> http://voice.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to