Re: [racket-users] IEEE 754 single precision float support

George Neuner Thu, 12 Apr 2018 03:32:32 -0700


On 4/11/2018 7:03 PM, Philip McGrath wrote:

From one following along who knows fairly little about floating-pointmath (the Toronto/McCarthy paper looks very informative!):
On Tue, Apr 10, 2018 at 12:13 AM, George Neuner <gneun...@comcast.net<mailto:gneun...@comcast.net>> wrote:
    As Philip McGrath mentioned already, you can specify single
    precision at least for input values ... but I think inexact (i.e.
    floating point) math in Racket all is done at maximum hardware
    precision and the result coerced back to the input precision if
    possible.  For purposes of numerical analysis, that is not the
    same as performing all operations at the input precision.


Is this confirmed to be what Racket does?

Yes and no. I did not intend to attribute the issue to Racket - it'sreally a hardware issue.

Racket follows the modern convention of using the CPU's SIMD hardware todo FP math. SIMD hardware (typically) does not widen results, so youget back the same precision you put in ... but internally there was anintermediate double-wide mantissa computed that maybe didn't fit intothe output.


   You see:
      single X single -> single
      double X double -> double

   The hardware does:
      single X single -> double -> single
      double X double -> quad -> double

If the result overflows or underflows, you get INF or zero rather than acorrect but wider answer.

If you are trying to check your FPU using software emulations: i.e.bigint-rationals or bigfloats, then you need to know exactly how thecomputations are being done. More on that below.

Modern x86 have both SIMD and x87 co-processor. The SIMD unit wasintroduced on the Pentium III with single precision and was expanded todouble precision on the Pentium IV and later. The modern convention touse the SIMD unit exclusively leaves the x87 idle all the time. There'sa good reason for it - the SIMD unit is faster because it is moretightly integrated ... the x87 originally was a separate chip and eventhough it now is integrated, there still is a performance penalty forhaving to use the co-processor interface protocol. And the SIMD unitproduces more consistent results [see below].

The x87 and the SIMD unit do not work the same way, and they may producedifferent results from the same calculation. The SIMD unit offerssingle or double precision. The x87 offers a 3rd choice: extendedprecision - an 80 bit format that has 64 bits of mantissa [vs 53 bits inan IEEE double]. Extended precision is the default mode of the x87.

The x87 always computes a full (64-bit mantissa) extended precisionresult before rounding AT the selected precision and then truncating TOthe selected precision [which clears any trailing bits in theregister]. In the SIMD unit, precision is specified by theinstruction: the registers are (virtually) subdivided into slots whichare sized appropriately for the specified precision [no trailing bits],and all slots are computed in parallel TO the specified precision.

Very different. And potentially confusing if you are expecting to getbit-identical results.

GPU or DSP calculations can be even more challenging to check with a CPUbecause their FPU hardware often is very different - not just not IEEEcompliant, but sometimes not even close.




Aside for Dale:

If you're on x86, and you find that rationals or bigfloats are too slow,the x87 may offer a faster way to your more precise calculations.see:https://docs.racket-lang.org/reference/extflonums.html?q=math%2Fextflonum#%28tech._extflonum%29

If so, are the observable differences vs. the operations working atthe input precision because of error and error propagation

Yes. Consider what happens at the limits of the selected precision. Switching to C for a moment, consider:


   float X = FLT_MAX;
   float Y = FLT_MAX - 4;

   float Z = (X + Y) / 2;
   float W = (float) (((double) X + Y) / 2);

Z gets compiled something like:
   float t1 = X + Y
   float t2 = t1 / 2
   Z = t2

W get compiled something like:
    double t1 = (double) X
    double t2 = (double) Y
    double t3 = t1 + t2
    double t4 = t3 / 2
    W = (float) t4

In the case of Z the temporaries are single precision, so (X + Y)overflows to INF and that carries forward through the rest of thecalculation. In the case of W, the temporaries are double precision, sothe calculation of (X + Y) succeeds and W gets the right answer.

Now you don't see these temporaries [except maybe by compiling in debugmode and specifying "consistent" floating point], but they exist in theFPU registers. The actual answers you would get depend on the FPUhardware and how the code was compiled.

E.g., on x87, keeping the temporaries in registers, both Z and W wouldget the same (right) answer if the FPU were in double or extendedprecision mode even though the compiler thinks the temporaries aresingle precision values. But if the code were compiled for"consistent" floating point - which forces temporaries to be truncatedto *storage* precision as if saved to memory and reloaded - then aftert1 computed some actual double precision value, that value would betruncated to single precision regardless of FPU mode. The result wouldbe wrong, but it would not be INF as it would be with the SIMD unit orif the x87 had been in single precision mode from the beginning.

Computing Z on the SIMD unit, t1 will always overflow to INF because theinternal double precision intermediate can't be stored back into thesingle precision register. So the SIMD unit produces more consistentresults in keeping with programmer expectations. [another reason it ispreferred now]

(i.e., if starting from exact values, the implementation usingdouble-precision under the hood would sometimes produce results withless error), or something else?

Double precision calculation MAY have less error than single precisioncalculation ... it depends on whether temporaries over or under flow.

-Philip


Hope this helps,
George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] IEEE 754 single precision float support

Reply via email to