You need to go with the guard bits, which is the excess in the number of
bits in the IEEE double precision mantissa (52, or 15.65 decimal places for
64-bit, ) and the 20.47 in 80-bit).  All common arithmetic co-processors
operate with 80-bit floating point with a 12-bit exponent, thus have 16
guard bits for double precision results.  If the numerical error exceeds the
depth of the guard bits, numerical error is creeping into the result.  I
would expect that numerical error from libraries designed for 64-bit or
80-bit or 128-bit results won't be a problem for DFP results with 15, 20, or
34 digits, respectively.  Certainly they will be less of a problem than
numerical libraries that compute them using BCD arithmetic unless guard
digits are used.  Note that the most commonly used transcendental functions
are computed in hardware in 80-bit floating point.

The 12 bit mantissa overflows at about 10^(308).  Single precision uses an
10-bit mantissa and overflows at about 10^(38).

James K Beard


-----Original Message-----
From: Kai Tietz [mailto:[email protected]] 
Sent: Wednesday, March 23, 2011 2:29 PM
To: [email protected]; [email protected]
Cc: James K Beard; JonY
Subject: Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math

2011/3/23 James K Beard <[email protected]>:
> You don't need to go to BCD to convert DFP to IEEE (regular) floating
point.
> A single arithmetic operation directly in DFP will exceed what you do to
> convert to IEEE floating point.  I would use double precision for anything
> up to 12 decimals of accuracy, 80-bit for another three, and simply
> incorporate the quad precision libraries with credit (or by reference, if
> differences in licensing are a problem) for distribution.
>
> Anything other than binary representation will be less efficient in terms
of
> accuracy provided by a given number of bits.  By illustration, base 10
> requires four bits, but provides only 3.32 bits (log2(10)) per digit of
> accuracy.  The only relief from this fundamental fact is use of less bits
> for the exponent, and in IEEE floating point the size of the exponent
field
> is minimized just about to the point of diminishing returns (problems
> requiring workaround in areas such as determinants, series and large
> polynomials) to begin with.
>
> James K Beard

Well, DFP <-> IEEE conversion is already present in libgcc. So you
shouldn't need here any special implementation. I would suggest that
you are using for 32-bit and 64-bit DFP the double type, and AFAICS
the 80-bit IEEE should be wide enough for the 128-bit DFP. How big is
its exponent specified? Interesting might be the rounding.

Regards,
Kai



------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to