[PHP-DEV] fp, rounding, decimal arithmetic - definitive account?

George Whiffen Fri, 23 Jul 2004 01:18:28 -0700


For the record, herewith:
- definitive (?) list of php's fp/rounding/arithmetic issues
- analysis of the issues
- discussion of options for handling the issues
- description of a comprehensive and "painless" solution
- links to and discussion of proof-of-concept C code

Apologies for the length of the post, but the subject matter is no more suited to piece by piece discussion, than it is to piece by piece resolution.

N.B. Throughout this post "decimal digits" refers to "significant decimal figures", and should not be confused with "decimal places".


PHP FP/DECIMAL/ROUNDING ISSUES
==============================

A. General Arithmetic Errors
----------------------------

A1: Simple Decimal Arithmetic Failure
0.8 == 0.7 + 0.1 ==> false

A2: Composite Decimal Arithmetic Failure
print floor(8.2 - 0.2) ==> 7

A3: Difference Errors
print (0.5001 - 0.5) ==> 9.9999999999989E-05

A4: Cumulative Arithmetic Errors
for ($i=1;$i<=1000000;++$i){$sum += 0.1;}print $sum ==> 100000.00000133


B. Intel (non-Win32) Specific Errors
------------------------------------

B1: Round (unfixed by fuzziness)
Intel GNU with FUZZ: round(12345678.0255,3) ==>12345678.025

B2: Exact Arithmetic Errors
Intel GNU: 0.002877 == 2877/1000000 ==> false

B3: Round (fpu rounding twice)
Intel GNU with FUZZ: 0.002877 == round(0.002877,6) ==> false

B4: Round - inexact double - created by fuzziness
Intel GNU with FUZZ: round(0.9499999999999,1) ==> 1.0

B5: Round - exact double - created by fuzziness
Intel GNU with FUZZ: round(144999999999999,-13) ==> 1.5e14


C. High Exponent Round Algorithm Failure
----------------------------------------

C1: Cross-Platform Rounding
Most (All?) Platforms: 1e-24 == round(1e-24,24) ==> false


C2: Platform Specific Variations
Intel GNU: 1e-23 == round(1e-23,23) ==> false
Win32: 1e-23 == round(1e-23,23) ==> true
Intel GNU: 5e-23 == round(5e-23,23) ==> true
Win32: 5e-23 = round(5e-23,23) ==> false


D. Inconsistent Rounding
------------------------

round(0.255,2) ==> 0.26
sprintf("%.2f",0.255) ==> 0.25
ini_set('precision',2);print 0.255 ==> 0.26

round(0.235,2) ==> 0.24
sprintf("%.2f",0.235) ==> 0.23
ini_set('precision',2);print 0.235 ==> 0.23

C: printf("%.2f",0.245) ==> 0.24
php: printf("%.2f",0.245) ==> 0.25


E. Functional Limitations
-------------------------

E1: No half-even (bankers) rounding option
See Rasmus Lerdorf's comments on Bug #24142

E2: No round to precision


ANALYSIS
========

A. Arithmetic Errors
--------------------

php implements its decimal arithmetic using C doubles, i.e. IEEE754 64bit binary representations. While the IEEE standard guarantees exact binary arithmetic results within range and exactly rounded outside of range, php's decimal arithmetic results are neither exactly rounded nor exact within range.

C double representations of decimal numbers (e.g. 0.1) are typically
inexact.  These "representation" errors, which only effect users of
C doubles for "other-base" arithmetic, can only be corrected by
rounding of the C double result in the appropriate base, base ten for
php.  php does not do this, therefore arithmetic results may be in error.

Often the incorrect results are masked by subsequent implicit rounding
e.g. at output/string conversion.  However, comparisons for equality,
(A1), integer conversions (A2) and difference operations (A3) can
reveal them immediately.  If the results are used for further
arithmetic operations the errors can accumulate ad infinitum (A4).

The errors are strictly bounded.  Even with maximal errors in
representation and an inexact final result, the returned result when
correctly rounded to 15 decimal digits will be the same as the true
decimal result, if it is of 15 digits or less.  This is particularly
significant since 15 digits is the limit for consistent representation
of decimals by 64-bit binaries (C doubles).  (Not all 16 decimal digit
numbers have distinct C double representations e.g. 0.9999999999999995
and 0.9999999999999994 have the same C double representation).

Therefore, with the addition of appropriate rounding, exact decimal
arithmetic to 15 significant figures is perfectly possible with php's
current fp implementation.

What is not possible, is exactly rounded decimal arithmetic, (the basic IEEE854 requirement) which requires the result be calculated to "infinite" precision and rounded to the best approximation. Decimally rounded php arithmetic can only guarantee that when the true result is of 16 or more digits, the returned result after rounding may be either the result immediately above or below the true result, but not necessarily the closer of the two. Practically, exactly rounded decimal fp arithmetic requires decimal fp hardware or complex and slow multi-precision arithmetic.

For most php users, most of the time, exact arithmetic is quite good enough. In particular, for money calculations, where exactness is typically an overwhelming, and often legal, requirement, exactly rounded results are largely irrelevant. Either calculations are simple and guaranteed to be exact within 15 digits e.g. ledger arithmetic, or they are very tightly specified to guarantee consistency and exactness at each step, e.g. interest calculations.

This is in marked contrast to typical C users, e.g. coding complex
scientific and mathematical algorithms.  They need the full IEEE
functionality including access to the exact flag, rounding
mode, precision control as well as exactly rounded results, (and
appropriate numerical analysis) to validate their results.

Unfortunately even just correct decimal rounding of C doubles is a non-trivial matter as php's other rounding issues clearly illustrate.


B. Intel FPU Precision Problems
-------------------------------

Although only specific to Intel platforms, fpu extended precision
errors in C double arithmetic, can easily confuse attempts to
correctly round decimals across platforms.

By default Intel FPUs operate in extended precision mode. On Win32 platforms the compiler forces them to double precision. However, some *nix compilers e.g. GNU, leave the FPU at the extended precision default.

An FPU in extended precision mode creates two problems for decimal
arithetic using C doubles:

Firstly, different results are returned depending on whether the
optimiser holds intermediate arithmetic results in extended precision
on the FPU registers or converts them to doubles on the way through
memory, (B1).

Secondly, (and much more rarely), a single arithmetic operation with
an inexact result can be vulnerable to errors from rounding twice.
Rounding once to extended precision during the actual calculation
and then again to double precision on return of the result from
the FPU can introduce new errors, (B2 and B3).

Both problems can be solved by a fast and thread-safe force/restore of
the FPU precision mode, (e.g. _FPU_GETCW(), _FPU_SETCW()) with no
impact on other code e.g. maths libraries.

On the other hand, the current solution for problem B1 which has been implemented in php_round, i.e. addition of a "fuzz factor", can itself create new errors, (B4, B5), and does nothing to address B2 or B3.


C. Algorithmic Failures on Inexact Powers of Ten
------------------------------------------------

Normal double FPU arithmetic will typically return the correct "nearest" C double representation of a decimal fp on division of a rounded integer mantissa by an appropriate power of ten (i.e. the algorithm used by php's round). However, this is only guaranteed to work if the power of ten is itself exact i.e. between +/- 22 (1e-22 to 1e22).

For decimals with exponents outside this range e.g. 1e-23, 5e24 etc., php's rounding algorithm and all rounding algorithms which only use fp arithmetic will sometimes fail to return the closest C double representation. This becomes a problem, if, as is typically the case, the platform's strtod() always returns the absolutely closest representation e.g. by using multi-precision integer arithmetic.

The consequence is that 1e-24 is no longer equal to 1/1e24 or round(1e-24,24). The simplest, but relatively expensive solution, is to use the platform's own strtod(),sprintf() as the last part of the rounding operation for decimal exponents greater than +/- 22 i.e. when rounding operands with absolute value outside the range of 1e-8 to 1e23 using the maximum sensible precision of 15 digits. This is much more expensive in terms of performance than the pure fp algorithm which can be safely used for numbers such as 0.0000000123456789012345 but not for 0.000000001.


D. Inconsistent Implicit Rounding
---------------------------------

Although the explicit rounding functions, i.e. php_round and
number_format, now share the same rounding macro, the implictly
rounding functions i.e. the output/string cast functions and
the sprintf/printf family do not.  This leads to a host of
inconsistencies depending on which function is used and on which
platform.  sprintf/printf are not even consistent with their C
counterparts.

Much of the problem is related to the issue of whether to round
half-even (the fpu default/bankers rounding), or round half-up
(the common high-school and scientific rounding mode).  Both
php's round function and the printf family attempt to round
half-up, the output/string converters stay with the fpu default
of round half-even.

One way to comprehensively resolve this issue is to include
mode-sensitive rounding using a common rounding routine as a
first stage of the string cast/printf routines.  In that case,
not only do all php's rounding functions, implicit and explicit,
return consistent results, but it becomes possible to allow full
user control over rounding mode.


E. Functional Limitations
-------------------------

The absence of explicit control over rounding mode (F1) appears to have
led to much confusion even among php gurus.

Both round-half-up and round-half-even (US banker's rounding), have
legitimate uses.  The absence of more user demand for rounding mode
control probably owes more to the general fp problems forcing users
into hand-crafted solutions, e.g. application-level fixed point shift,
rather than a genuine lack of interest.

The other obvious functional limitation with regard to rounding in
php is the requirement to code rounding to a specific precision
using php's round to places (F2), i.e. round(expression,precision) must
be coded as:

round(expression,precision+1+floor(log10(expression)))

This is not only clumsy but unnecessarily expensive in terms of
performance.


SOLUTIONS
=========

1. User Workarounds
-------------------

Taken altogether, the issues in php's fp implementation listed above
present the php user with a daunting task if they wish to be confident
of their arithmetic results.  Practically they have at least three
alternatives:

a) Do nothing
This is probably the most popular approach.  Most of the time the
problems are succesfully masked by implicit or explicit output
conversions and problematic numbers such as 0.002877 and 0.235
simply don't turn up.  The most common problem they will come across
is in comparisons for equality about which the php documentation
has clear warnings.  On average, they will "get away with it".

The big problem is that they may not realise that their code is intrinsically unsound and just waiting to meet a problem such as A2 or A4. Unless they have are already aware of all the problems testing is unlikely to be comprehensive. The danger is that problems show up years later, perhaps because of scaling or platform changes. Then diagnosis and correction of the code may have turned into nightmares. It's easy to imagine a traumatised end-user when they finally discover "that's how php arithmetic works" deciding to take the "safe option" and go for a rewrite using .NET and its decimal classes.

b) Avoid the fp numeric type
The experienced C/Open Source developer may well simply avoid php's
decimal arithmetic completely, preferring application-level fixed-point
shift i.e. integer arithmetic only.  They will have to watch out for
php's automatic type-casting converting their integers back to floating
points, but since fp integer arithmetic is in any case exact, this is
not such a big problem.

They are most likely to be vulnerable to the coding errors that the extra complexity and clumsiness of fixed-point shifts necessarily entails e.g. confusion over where the point is fixed in data and code. Unfortunately, however experienced they may be with single fixed-point coding, they are increasingly likely to be faced with variable fixed-point processing to cope with increasingly complex applications e.g. multi-currency, multi-sales-tax, multi-tier-commission e-commerce.

True floating-point, (if results are exact), is much more suited to
generic, reusable coding.

c) Use "string" arithmetic

The php specialist may well prefer to keep their numbers "floating" but to use a "string" arithmetic to ensure correct results. Whether they opt for a string cast of fp arithmetic results or use of the bc libraries the arithmetic should work fine. While bc is typically faster, string casts are generally more flexible since no scale control is required.

Of course, the problems of carrying out any required rounding on the results remain, since bc truncates out-of-scale results and string casts will try to round half-even. They also require greater code discipline and certainly reduce code "readability". Nevertheless either of these is probably a better user workaround than the other alternatives currently available to the user.

2. Developer Fixes
------------------

a) Replace php's fp numeric type

This is the conventional position of most languages from Cobol in the 1960s to .NET's decimal classes. They have specific decimal types for which decimal arithmetic works. They are typically semi-fixed, (much like bc with its scale control) and may or may not have the advantage of dedicated hardware.

Although php could take the same approach, it has special problems: firstly, of course is the loose typing and automatic type conversion e.g. context or value triggered integer/fp/string conversions; secondly there is the issue of compatibility with extensions and external libraries, such as the C math library; thirdly there is the problem of ensuring backward compatibility, (a major php strength); and finally performance, although php's arithmetic may be a couple of orders of magnitude slower than C's, it would be sad not to take advantage of the blindingly fast fp hardware on nearly every platform.

In short, any change to move away from C doubles as the core number
type is fraught with problems.

b) Problem-by-problem fixups

This has historically been the php approach. Unfortunately the big problem is that partial solutions, (such as the fuzzy fix for php_round), can actually make things worse! The situation is already so complex for the php user that even "good fixes" may simply confuse them further by simply replacing one "gotcha" that they have coded around with the next one on the list.

c) A Comprehensive Solution retaining C doubles

The good news is that the alternative of a comprehensive resolution of all the issues is not only not impossible but also not much more work than any one partial solution. It can also be implemented in such a way as to not only retain full backward compatibility but to greatly simplify the user documentation.

The key requirement is to have a safe, dependable but still fast central rounding function. The rest is straightforward. It just needs care over the precision used in rounding e.g. counting significant lead zeros on differences.

Such a solution is outlined below:


OUTLINE SOLUTION
================

The solution described here in outline and with links to proof-of-concept code resolves all the issues listed with minimal confusion for the php user. The only visible changes for the user are:

a) New "exact_decimal_arithmetic" configuration variable

When set to true, this enables fixes for all the fp issues,
i.e. arithmetic, correct rounding, rounding mode control and
consistent rounding.  All the user needs to know is that while 15
or less digit results will always be correct, 16 or more digit
results will be rounded to a 15 digit decimal immediately above or
below the true result.

No coding changes are required to take advantage of the new feature, and historically "unsafe" code becomes immediately "safe" when the variable is set. It can be switched on and off at run-time at any point in a calculation with no adverse consequences.


b) New "rounding_mode" configuration variable

When exact_decimal_arithmetic is on, the value of rounding_mode, either half_up, (the default), or half_even controls rounding mode across the whole of php. It sets the behaviour of both explicit rounding such as round and number_format and implicit rounding in string/output conversion, print/sprintf. It too can be changed at any point during execution.


CODE CHANGES
------------

As of 5.0.0, implementation of a solution along these lines requires
changes to three sources: Zend/zend_operators.c, ext/standard/math.c
and formatted_print.c as follows:

ext/standard/math.c
-------------------

1. Replace PHP_ROUND_WITH_FUZZ macro by central_round() function
with the following features:
- non-Win32 Intel FPUs forced to double-precision
- alternate rounding depending on rounding_mode setting
- sprintf/strtod when abs(places) >= 23
- pow() replaced by new getintpow10() function for performance

2. New getintpow10() function (performance tweak)
- returns powers of 10 from 1 to 23 from compiler generated array

3. New round_to_precision() function
- wrapper for central_round()
- sets "places" to match 15 digits of absolutely largest of up
   to 3 non-zero double arguments

4. New getintlog10() function (performance tweak)
- returns floor(log10(val))
- uses fast, hard-coded "if" tree for val between 1e-8 and 1e23

5. Call to central_round replaces PHP_ROUND_WITH_FUZZ in:
- PHP_FUNCTION(round)
- PHPAPI _php_math_number_format

ext/standard/formatted_print.c
------------------------------

1. Conditional call to round_central in:
- php_sprintf_appenddouble()

Zend/zend_operators.c
---------------------

1. Conditional calls to round_central in:
- _convert_to_string()
- zend_locale_sprintf_double()

2. Conditional calls to round_to_precision (one non-zero arg) in:
- mul_function()
- div_function()

3. Conditional calls to round_to_precision (3 non-zero args) in:
- add_function()
- sub_function()
- adjusts for potential "catastrophic loss of precision"


Links to Sample Code and Sample Test
------------------------------------
http://www.whiffen.net/phprounding/math.c_fixed
http://www.whiffen.net/phprounding/formatted_print.c_fixed
http://www.whiffen.net/phprounding/zend_operators.c_fixed
http://www.whiffen.net/phprounding/test.php
http://www.whiffen.net/phprounding/test_results_unfixed.txt
http://www.whiffen.net/phprounding/test_results_fixed.txt

Notes on Code
-------------

1. The code is strictly proof-of-concept. The best that
can be said about it is that it works.  (I make no claims
to be a C programmer let alone a developer of php).  It would
surely need extensive re-working, probably by a core developer,
before submission for consideration.

2. It's incomplete. The major missing element is the setup of the two new configuration variables. Code stumps are included to check the configuration variables but in the sample code exact_decimal_arithmetic is always true and rounding_mode is always half-up.

3. Also omitted is the config.m4 code for establishing that
fpu switching is necessary.

4. The code has only been tested under Debian Linux.


Performance
-----------

Performance impact is negligible for low exponent operands (15 digit numbers between 1e-8 and 1e23).

php's round function is actually faster.

The string conversion/output, printf and sprintf functions are already so slow that it's hard to detect any impact.

For arithmetic, where the extra load is detectable it is roughly comparable to a single array or object member access i.e. exact $a + $b is as fast as inexact $a->a + $b

The implication is that unless use of arrays and objects should be deprecated on performance grounds, there can be no objection on performance grounds to enabling exact arithmetic for the vast majority of user calculations.

On the other hand, for high-exponent operands i.e. < 1e-8 or >= 1e23, performance impact is considerable i.e. comparable to an additional string cast.

Since high-exponent data is often scientific or engineering data
which is itself known to be inexact, users may well wish to leave
exact_decimal_arithmetic off for these kinds of applications.

FINAL NOTE
==========

Hopefully, after appropriate corrections have been posted, we now
have a better overall description of php's fp/rounding issues on
record.

Furthermore it should be clear that something could be done about
all of them.  Whether anything should be done is another question,
and perhaps ultimately a matter of opinion.

As you either already know or could reasonably guess, my personal view is that simple, consistent, exact decimal arithmetic is highly desirable for any development tool intended for either novices or commercial use.

The fact that C does not offer decimal arithmetic is unsurprising given its pedigree and key application areas. The fact that other Open-Source tools such as tcl, Perl and MySQL have decimal problems may owe to the difficulties of implementing solutions given their code structure. The fact that php still has decimal problems given its user base and the relative ease with which the problems can be solved, is something I do not yet understand.

Awaiting enlightenment,


George Whiffen

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] fp, rounding, decimal arithmetic - definitive account?

Reply via email to