[ 
https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659319#action_12659319
 ] 

Uwe Schindler commented on LUCENE-1496:
---------------------------------------

I looked into the code of NumberUtils:

The encoding is very similar to the one of TrieUtils (used in TrieRangeQuery, 
see LUCENE-1470, 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/trie/TrieUtils.html).
 The only difference between TrieUtils and NumberUtils is the more compact 
encoding in NumberUtils (because in TrieUtils.VARIANT_8BIT uses one character 
per byte, NumberUtils uses 14 bits per character). TrieUtils works also correct 
with String.compareTo() (it was the intention behind TrieUtils).

In my opinion, TrieUtils has some more advantages:
- Doubles are encoded in a correctly sortable way (even Double.XXX_INFINITY!), 
using the IEEE binary representation of doubles with some bit alignments.
- Direct support for Dates and longs
- Builtin comparator for the new SortField constructor (LUCENE-1478)  and a 
nice SortField factory. This maps all encoded values to a FieldCache with long 
values (even for dates or doubles because there is no difference, longs have 
the fastest encoding/decoding speed - for sorting, the real values are not 
interesting).

The only problem is, that indexes, encoded with the old NumberUtils are not 
readable by TrieUtils. But if we include such things into Lucene, we should not 
duplicate code and create again new encodings.

For the more compact encoding, TrieUtils could be extended, to also support a 
"14bit" Trie variant (which would not work for real trie encoding), but may be 
used for simply store longs very compact. On the other hand, if somebody uses 
NumberUtils, he may be also interested in TrieRangeQuery, so he should use 
TrieUtils.VARIANT_8BIT.

So I think, we should perhaps leave NumberUtils at solr and use TrieUtils in 
Lucene. LocalLucene should then also use TrieUtils. And solr may in future 
switch to Trie encoding with the next major version, too.

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing 
> with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to