Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math

James K Beard Wed, 23 Mar 2011 13:33:55 -0700

You need to go with the guard bits, which is the excess in the number of
bits in the IEEE double precision mantissa (52, or 15.65 decimal places for
64-bit, ) and the 20.47 in 80-bit).  All common arithmetic co-processors
operate with 80-bit floating point with a 12-bit exponent, thus have 16
guard bits for double precision results.  If the numerical error exceeds the
depth of the guard bits, numerical error is creeping into the result.  I
would expect that numerical error from libraries designed for 64-bit or
80-bit or 128-bit results won't be a problem for DFP results with 15, 20, or
34 digits, respectively.  Certainly they will be less of a problem than
numerical libraries that compute them using BCD arithmetic unless guard
digits are used.  Note that the most commonly used transcendental functions
are computed in hardware in 80-bit floating point.


The 12 bit mantissa overflows at about 10^(308).  Single precision uses an
10-bit mantissa and overflows at about 10^(38).

James K Beard


-----Original Message-----
From: Kai Tietz [mailto:[email protected]] 
Sent: Wednesday, March 23, 2011 2:29 PM
To: [email protected]; [email protected]
Cc: James K Beard; JonY
Subject: Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math

2011/3/23 James K Beard <[email protected]>:
> You don't need to go to BCD to convert DFP to IEEE (regular) floating
point.
> A single arithmetic operation directly in DFP will exceed what you do to
> convert to IEEE floating point.  I would use double precision for anything
> up to 12 decimals of accuracy, 80-bit for another three, and simply
> incorporate the quad precision libraries with credit (or by reference, if
> differences in licensing are a problem) for distribution.
>
> Anything other than binary representation will be less efficient in terms
of
> accuracy provided by a given number of bits.  By illustration, base 10
> requires four bits, but provides only 3.32 bits (log2(10)) per digit of
> accuracy.  The only relief from this fundamental fact is use of less bits
> for the exponent, and in IEEE floating point the size of the exponent
field
> is minimized just about to the point of diminishing returns (problems
> requiring workaround in areas such as determinants, series and large
> polynomials) to begin with.
>
> James K Beard

Well, DFP <-> IEEE conversion is already present in libgcc. So you
shouldn't need here any special implementation. I would suggest that
you are using for 32-bit and 64-bit DFP the double type, and AFAICS
the 80-bit IEEE should be wide enough for the 128-bit DFP. How big is
its exponent specified? Interesting might be the rounding.

Regards,
Kai



------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math

Reply via email to