Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math

K. Frank Wed, 23 Mar 2011 11:49:00 -0700

Hi Jon and James!

On Wed, Mar 23, 2011 at 12:45 PM, JonY <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 3/23/2011 22:06, James K Beard wrote:
>> Jon:  The simplest and quite possibly the most efficient way to implement a
>> standard function library in BCD decimal arithmetic is to convert to IEEE
>> standard double precision (or, if necessary, quad precision), use the
>> existing libraries, and convert back to BCD decimal floating point format.
>> The binary floating point will have more accuracy, thus providing a few
>> guard bits for the process, and hardware arithmetic (even quad precision is
>> supported by hardware because the preserved carry fields make quad precision
>> simple to support and allow good efficiency) is hard to match with software
>> floating point, which is what any BCD decimal arithmetic would be.
>>
>> James K Beard
>
> Hi,
>
> Thanks for the reply.
>
> To my understanding, converting DFP to BCD then IEEE float and back
> again seems to defeat the purpose using decimal floating points where
> exact representation is needed, I'm not too clear about this part. Will
> calculations suffer from inexact representation?


I believe that this is a fully legitimate concern.

To be explicit, because decimal exponents scale numbers by
powers of five, as well as two (10 = 2 * 5), and binary exponents
only scale by powers of two, decimal floating-point numbers can
represent more real numbers exactly than can binary floating-point
numbers.

By way of example, 1/2 can be represented exactly in both decimal
and binary floating-point, 1/5 can be represented exactly in only
decimal floating-point (and 1/3 can be represented exactly in neither).

Because of this, blindly converting from decimal to binary, carrying
out the computation, and converting back to decimal can fail to
produce the same result as carrying out the "correct" decimal
computation.

Having said that, if you wish to perform fixed-precision (as distinct
from fixed-point) decimal arithmetic, and your binary floating-point
hardware has enough extra precision (I'm not sure of exactly how
much is needed, but I would think that one extra decimal digit of
precision would be more than enough), then I believe that (neglecting
underflow, overflow, denormalization, and so on), James's scheme
can be made to work (although I don't think I would call it simple).

(I use the phrase fixed-precision in contrast to arbitrary-precision.
By a fixed-precision decimal floating-point number, I mean a
mantissa with a fixed number of decimal digits -- say ten -- and
a decimal exponent.)

In its simplest form, the basic idea is for each decimal floating-point
arithmetic operation convert the operands to binary floating-point,
perform the operation, and convert back to decimal floating-point
by rounding to the nearest decimal floating-point value.

This, however, isn't cheap.  All of this converting and rounding is
somewhat costly, and defeats the added benefits of any kind of
floating-point pipeline and registers.

Note, if you don't convert back to decimal floating-point after every
operation (or implement some other additional logic), you won't be
guaranteed to get "correct" decimal floating-point results.

For example, (1/5 + 2/5) - 3/5 is exactly equal to zero in real
(non-computer) arithmetic.  It should also be exactly zero in
correctly-implemented decimal floating-point arithmetic, because
all input values, intermediate results, and the final result are exactly
representable by decimal floating-point numbers.

However, if you calculate this with double-precision binary
floating-point operations (without rounding the intermediate
results back to decimal floating-point, and reconverting them
to binary floating-point), you will get a non-zero result on the
order of 10^-16 (the approximate precision of double precision).
Note, that rounding this result back to decimal floating-point
still leaves you with this non-zero result -- the result of the binary
computation is a perfectly good value that can be well-approximated
by a decimal floating-point number with, say, ten decimal digits
of precision.

Of course, it all depends on what you actually need.  If you don't
need the specific results that correct decimal floating-point
arithmetic gives you, then converting to binary, computing, and
converting back will generally give you a very good result.

But if you don't need the specific decimal results, why not just
use binary from the beginning?


Good luck.


K. Frank

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math

Reply via email to