You need to go with the guard bits, which is the excess in the number of bits in the IEEE double precision mantissa (52, or 15.65 decimal places for 64-bit, ) and the 20.47 in 80-bit). All common arithmetic co-processors operate with 80-bit floating point with a 12-bit exponent, thus have 16 guard bits for double precision results. If the numerical error exceeds the depth of the guard bits, numerical error is creeping into the result. I would expect that numerical error from libraries designed for 64-bit or 80-bit or 128-bit results won't be a problem for DFP results with 15, 20, or 34 digits, respectively. Certainly they will be less of a problem than numerical libraries that compute them using BCD arithmetic unless guard digits are used. Note that the most commonly used transcendental functions are computed in hardware in 80-bit floating point.
The 12 bit mantissa overflows at about 10^(308). Single precision uses an 10-bit mantissa and overflows at about 10^(38). James K Beard -----Original Message----- From: Kai Tietz [mailto:[email protected]] Sent: Wednesday, March 23, 2011 2:29 PM To: [email protected]; [email protected] Cc: James K Beard; JonY Subject: Re: [Mingw-w64-public] mingw-w64 Decimal Floating Point math 2011/3/23 James K Beard <[email protected]>: > You don't need to go to BCD to convert DFP to IEEE (regular) floating point. > A single arithmetic operation directly in DFP will exceed what you do to > convert to IEEE floating point. I would use double precision for anything > up to 12 decimals of accuracy, 80-bit for another three, and simply > incorporate the quad precision libraries with credit (or by reference, if > differences in licensing are a problem) for distribution. > > Anything other than binary representation will be less efficient in terms of > accuracy provided by a given number of bits. By illustration, base 10 > requires four bits, but provides only 3.32 bits (log2(10)) per digit of > accuracy. The only relief from this fundamental fact is use of less bits > for the exponent, and in IEEE floating point the size of the exponent field > is minimized just about to the point of diminishing returns (problems > requiring workaround in areas such as determinants, series and large > polynomials) to begin with. > > James K Beard Well, DFP <-> IEEE conversion is already present in libgcc. So you shouldn't need here any special implementation. I would suggest that you are using for 32-bit and 64-bit DFP the double type, and AFAICS the 80-bit IEEE should be wide enough for the 128-bit DFP. How big is its exponent specified? Interesting might be the rounding. Regards, Kai ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
