Hi Sergey, On 11/20/2008 at 9:30 AM, Sergey Kabashnyuk wrote: > How can I convert java.math.BigDecimal numbers in to string > for its storing in lexicographical order
Here's a thoroughly untested idea, cribbing some from o.a.l.document.NumberTools[1]: convert BigDecimals into strings of the following form: <significand-sign> <exponent-sign> <exponent> <significand> As in NumberTools, the signs consist of either the '-' or the '0' character; '-' < '0'. The exponent must be fixed length, and serialized as in NumberTools, with left-zero-padding and using the negative inversion trick. The exponent could be expressed in any base that will fit into Java's 16-bit char - Lucene's NumberTools uses base 36; see Solr's NumberUtils[2] or LUCENE-1434[3] for base 0x8000 implementations. The exponent can be calculated from the number of digits in the serialized base 10 form of the significand (BigDecimal's "unscaled value") and the "scale" (the number of digits after the decimal): exponent = (number of significand digits) - scale - 1. The significand field can be variable length, though it can't contain any left-zero-padding, and could be expressed in any base; again, see [2] or [3]. Some examples (base 10 and 4-char-width exponent used for purposes of exposition), in sorted order: +5.E-3 => 0 - 9996 5 +1.E-2 => 0 - 9997 1 +1.0E-2 => 0 - 9997 10 +1.0000E-2 => 0 - 9997 10000 +1.1E-2 => 0 - 9997 11 +1.11E-2 => 0 - 9997 111 +1.2E-2 => 0 - 9997 12 +5.E-2 => 0 - 9997 5 +7.3E+2 => 0 0 0002 73 +7.4E+2 => 0 0 0002 74 +7.45E+2 => 0 0 0002 745 +8.7654E+3 => 0 0 0003 87654 Negative numbers are a problem for the significand, though, since NumberTools' negative inversion trick assumes a fixed-precision minimum value - you'd need to use a different technique here in order to enable variable length significands. Another entirely untested idea, to handle variable length negative significands: substitute (base - digit - 1) for each digit of the serialized representation (e.g. in base 10, 4 => 5 & 0 => 9), then append a sentinal digit that is greater than all other digits used to represent the significand. The format for negative BigDecimals, then, would be: '-' <reversed-exponent-sign> <negated-exponent> <significand> <sentinel> where the exponent and its sign are negated before serialization, so that their sense is reversed. Some negative examples (base 10 and 4-char-width exponent used for purposes of exposition), in sorted order: -8.7654E+3 => - - 9996 12345 A -7.45E+2 => - - 9997 254 A -7.4E+2 => - - 9997 25 A -7.3E+2 => - - 9997 26 A -5.E-2 => - 0 0002 4 A -1.2E-2 => - 0 0002 87 A -1.11E-2 => - 0 0002 888 A -1.1E-2 => - 0 0002 88 A -1.0000E-2 => - 0 0002 89999 A -1.0E-2 => - 0 0002 89 A -1.E-2 => - 0 0002 8 A -5.E-3 => - 0 0003 4 A The use of the sentinel digit 'A', which is greater than all of the other digits [0-9], ensures that negative values with greater precision are ordered before those that share significand prefixes but have lesser precision. Although BigDecimal claims to support arbitrary precision, its scale representation is an int (32 bits), and its significand ("unscaled value") can have at most Integer.MAX_VALUE bits (c.f. java.math.BigInteger.bitLength():int[4]). If the width of the exponent field is made so that it can express all long values (64 bits), I *think* this scheme can handle all BigDecimal values. Steve [1] o.a.l.document.NumberTools: <http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_0/src/java/org/apache/lucene/document/NumberTools.java?view=markup> [2] o.a.s.util.NumberUtils: <http://svn.apache.org/viewvc/lucene/solr/tags/release-1.3.0/src/java/org/apache/solr/util/NumberUtils.java?view=markup> [3] IndexableBinaryStringTools JIRA issue: https://issues.apache.org/jira/browse/LUCENE-1434 [4] BigInteger.bitLength(): <http://java.sun.com/j2se/1.4.2/docs/api/java/math/BigInteger.html#bitLength()>